Dorado¶
Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.
Dorado is available as a module on Apocrita.
Usage¶
To run the default installed version of Dorado, simply load the dorado
module:
$ module load dorado
$ dorado -h
Usage: dorado [options] subcommand
Positional arguments:
aligner
basecaller
demux
download
duplex
summary
trim
Optional arguments:
-h --help shows help message and exits
-v --version prints version information and exits
-vv prints verbose version information and exits
For help with a specific positional argument, run it with the -h
optional
argument:
dorado basecaller -h
The above output has been truncated, run the dorado basecaller -h
command to
see the full list of available options.
For optimal performance, Dorado requires POD5 file input. Files can be converted from other formats using the pod5 Python package.
Models¶
Models
can be downloaded at runtime using the
Automatic model selection complex.
For example, to run basecaller
using the hac@v3.5.2,5mCG@v2
model:
dorado basecaller \
hac@v3.5.2,5mCG@v2 \
/data/PublicDataSets/CliveOME-5mC/POD5/PAM63974_pass_58881fec_0.pod5 \
--verbose | \
samtools view --threads ${NSLOTS} -O BAM \
-o ./output/calls.bam
This will download the requested model into a hidden temporary directory at runtime and then remove this hidden temporary directory once execution is complete.
Models can also be downloaded in advance. To download all models into the current directory:
dorado download --model all
To download a specific model:
$ dorado download --model dna_r10.4.1_e8.2_400bps_hac@v3.5.2
[info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[info] - downloading dna_r10.4.1_e8.2_400bps_hac@v3.5.2 with httplib
You can then point your job at the path you downloaded the model(s) to. See below for examples.
Methylation calling
When running using a model downloaded in advance and using
methylation calling
, you will need to specify the --modified-bases
argument.
Further help for download
is available by running it with the -h
option.
dorado download -h
Example jobs¶
GPU recommended
Whilst running Dorado without a GPU is technically possible, it is strongly inadvisable as basecalling will be much slower when running purely on CPU.
GPU job¶
Use Ampere or Hopper cards
Dorado is heavily optimised for Nvidia A100 (ampere
) and H100 (hopper
)
GPUs and will deliver maximum performance on nodes containing these GPUs.
You can
select a specific GPU type
in your job script.
1 GPU¶
Below is an example job running on 8 cores and 1 GPU, based on benchmarks published on the AWS HPC Blog[2]. The output is piped to SAMtools to then collate it into a single BAM file.
#!/bin/bash
#$ -cwd
#$ -pe smp 12
#$ -l h_rt=240:0:0
#$ -l h_vmem=7.5G
#$ -l gpu=1
#$ -l gpu_type="ampere|hopper"
#$ -j y
#$ -N dorado
module load dorado
module load samtools
mkdir -p /path/to/output
dorado basecaller \
/path/to/models/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 \
/data/PublicDataSets/CliveOME-5mC/POD5/ \
--verbose \
--modified-bases 5mCG | \
samtools view --threads ${NSLOTS} -O BAM \
-o /path/to/output/calls.bam
2 GPUs¶
GPU node availability
Whilst using multiple GPUs will speed up your basecalling, you might wait longer for a session requesting multiple GPUs to start running.
Dorado will automatically run in multi-GPU cuda:all
mode and should
automatically run on as many GPUs as requested. Here is an example job running
on 16 cores and 2 GPUs:
#!/bin/bash
#$ -cwd
#$ -pe smp 24
#$ -l h_rt=240:0:0
#$ -l h_vmem=7.5G
#$ -l gpu=2
#$ -l gpu_type="ampere|hopper"
#$ -j y
#$ -N dorado
module load dorado
module load samtools
mkdir -p /path/to/output
dorado basecaller \
/path/to/models/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 \
/data/PublicDataSets/CliveOME-5mC/POD5/ \
--verbose \
--modified-bases 5mCG | \
samtools view --threads ${NSLOTS} -O BAM \
-o /path/to/output/calls.bam
References¶
[1] Dorado GitHub repository
[2] Benchmarking the Oxford Nanopore Technologies basecallers on AWS