Skip to content

Dorado

Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

Dorado is available as a module on Apocrita.

Usage

To run the default installed version of Dorado, simply load the dorado module:

$ module load dorado
$ dorado -h
Usage: dorado [options] subcommand

Positional arguments:
aligner
basecaller
demux
download
duplex
summary
trim

Optional arguments:
-h --help               shows help message and exits
-v --version            prints version information and exits
-vv                     prints verbose version information and exits

For help with a specific positional argument, run it with the -h optional argument:

dorado basecaller -h

The above output has been truncated, run the dorado basecaller -h command to see the full list of available options.

For optimal performance, Dorado requires POD5 file input. Files can be converted from other formats using the pod5 Python package.

Models

Models can be downloaded at runtime using the Automatic model selection complex. For example, to run basecaller using the hac@v3.5.2,5mCG@v2 model:

dorado basecaller \
    hac@v3.5.2,5mCG@v2 \
    /data/PublicDataSets/CliveOME-5mC/POD5/PAM63974_pass_58881fec_0.pod5 \
    --verbose | \
    samtools view --threads ${NSLOTS} -O BAM \
    -o ./output/calls.bam

This will download the requested model into a hidden temporary directory at runtime and then remove this hidden temporary directory once execution is complete.

Models can also be downloaded in advance. To download all models into the current directory:

dorado download --model all

To download a specific model:

$ dorado download --model dna_r10.4.1_e8.2_400bps_hac@v3.5.2
[info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[info]  - downloading dna_r10.4.1_e8.2_400bps_hac@v3.5.2 with httplib

You can then point your job at the path you downloaded the model(s) to. See below for examples.

Methylation calling

When running using a model downloaded in advance and using methylation calling, you will need to specify the --modified-bases argument.

Further help for download is available by running it with the -h option.

dorado download -h

Example jobs

GPU recommended

Whilst running Dorado without a GPU is technically possible, it is strongly inadvisable as basecalling will be much slower when running purely on CPU.

GPU job

Use Ampere or Hopper cards

Dorado is heavily optimised for Nvidia A100 (ampere) and H100 (hopper) GPUs and will deliver maximum performance on nodes containing these GPUs. You can select a specific GPU type in your job script.

1 GPU

Below is an example job running on 8 cores and 1 GPU, based on benchmarks published on the AWS HPC Blog[2]. The output is piped to SAMtools to then collate it into a single BAM file.

#!/bin/bash
#$ -cwd
#$ -pe smp 12
#$ -l h_rt=240:0:0
#$ -l h_vmem=7.5G
#$ -l gpu=1
#$ -l gpu_type="ampere|hopper"
#$ -j y
#$ -N dorado

module load dorado
module load samtools

mkdir -p /path/to/output

dorado basecaller \
    /path/to/models/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 \
    /data/PublicDataSets/CliveOME-5mC/POD5/ \
    --verbose \
    --modified-bases 5mCG | \
    samtools view --threads ${NSLOTS} -O BAM \
    -o /path/to/output/calls.bam

2 GPUs

GPU node availability

Whilst using multiple GPUs will speed up your basecalling, you might wait longer for a session requesting multiple GPUs to start running.

Dorado will automatically run in multi-GPU cuda:all mode and should automatically run on as many GPUs as requested. Here is an example job running on 16 cores and 2 GPUs:

#!/bin/bash
#$ -cwd
#$ -pe smp 24
#$ -l h_rt=240:0:0
#$ -l h_vmem=7.5G
#$ -l gpu=2
#$ -l gpu_type="ampere|hopper"

#$ -j y
#$ -N dorado

module load dorado
module load samtools

mkdir -p /path/to/output

dorado basecaller \
    /path/to/models/dna_r10.4.1_e8.2_400bps_hac@v3.5.2 \
    /data/PublicDataSets/CliveOME-5mC/POD5/ \
    --verbose \
    --modified-bases 5mCG | \
    samtools view --threads ${NSLOTS} -O BAM \
    -o /path/to/output/calls.bam

References

[1] Dorado GitHub repository
[2] Benchmarking the Oxford Nanopore Technologies basecallers on AWS