Cell Ranger¶
Cell Ranger is a set of analysis pipelines that process Chromium single cell RNA sequencing output to align reads, generate gene cell matrices and perform clustering and gene expression analysis.
Cell Ranger is available as a module on Apocrita.
Usage¶
To run the default installed version of Cell Ranger, simply load the
cellranger
module:
$ module load cellranger
$ cellranger
Usage:
cellranger mkfastq
cellranger count
cellranger aggr
cellranger reanalyze
cellranger mkloupe
cellranger mat2csv
cellranger mkgtf
cellranger mkref
cellranger vdj
cellranger mkvdjref
cellranger testrun
cellranger upload
cellranger sitecheck
For more information regarding each analysis pipeline, pass the --help
switch after the pipeline sub-command (i.e. cellranger count --help
).
Operating modes¶
Cell Ranger can be run in different modes; The most relevant two for us are:
- local (default)
- sge
Local operating mode¶
The local mode will execute the pipeline on a single machine, usually within a cluster job. To run all executions inside a single job, this is the desired mode.
Cores and Memory Requests
To avoid overloading a node, set the --localcores ${NSLOTS}
option to
restrict Cell Ranger to use the specified number of cores to execute
pipeline rather than all cores available.
To prevent your job from being killed by the scheduler for using more
memory than requested, set the --localmem
option to restrict Cell Ranger
to use specified amount of memory (in GB) to execute pipeline stages
rather than to use 90% of total memory available.
Multi-Job operating mode (sge)¶
The sge mode will launch each stage of the underlying Martian pipeline
framework as a different Apocrita job using the qsub
command. As jobs from
each stage are queued, launched, and completed, the pipeline framework will
track their states using the metadata files that each stage maintains in the
pipeline output directory.
Core Requests and Consumption
The number of cores required is determined by the pipeline framework stage however, Martian jobs will run on only 1 core.
Memory Requests and Consumption
To specify the memory-per-core resource (h_vmem
), pass the
--mempercore
switch. This value will be used in all pipeline framework
stages.
Example jobs¶
Serial job (local)¶
Here is an example job running on 4 cores and 16GB of memory to generate single-cell gene counts for a single library, in local mode:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=4G
module load cellranger
cellranger count --id 1234 \
--transcriptome example_transcriptome \
--fastqs raw_data.fastq \
--sample 1234 \
--disable-ui \
--jobmode local \
--localcores ${NSLOTS} \
--localmem 4
Serial job (multi-job)¶
Here is an example job running on 1 core and 4GB of memory to generate single-cell gene counts for a single library. Each stage will be submitted as a different job with the same resources.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load cellranger
cellranger count --id 1234 \
--transcriptome example_transcriptome \
--fastqs raw_data.fastq \
--sample 1234 \
--disable-ui \
--jobmode sge \
--mempercore 4