A single molecule sequence assembler for large and small geonomes.
To use the default version of
$ module load canu $ canu usage: canu [-correct | -trim | -assemble | -trim-assemble] \ [-s <assembly-specifications-file>] \ -p <assembly-prefix> \ -d <assembly-directory> \ genomeSize=<number>[g|m|k] \ [other-options] \ [-pacbio-raw | -pacbio-corrected | -nanopore-raw | -nanopore-corrected] \ <files.fastq> The assembly is computed in the (created) -d <assembly-directory>, with most files named using the -p <assembly-prefix>. The genome size is your best guess of the genome size of what is being assembled. It is used mostly to compute coverage in reads. Fractional values are allowed: '4.7m' is the same as '4700k' and '4700000'.
For full usage documentation, run
Error message - Gatekeeper detected problems in your input reads
To resolve this error message, supplement each dataset with
Illumina reads. Renaming the files in
or using the
stopOnReadQuality=false option will produce an
When executing a
canu command just like the one above, an Apocrita submission
script is created and submitted. This script is available under:
N is an incremented
number used internally by
canu to distinguish between Apocrita submissions.
This script contains the
canu parameters passed via the first command
and is submitted as a new job on Apocrita to start the canu sequence
assembler. The output of this second script is written to:
Here is an example
canu command which will submit an Apocrita job
running with 4 cores and 8GB total memory (default is 1 core and 4G
memory), using the
java binaries loaded via the
$ canu gridOptions='-l h_vmem=8G -pe smp 4' -p 'Ppal' -d 'output' \ -nanopore-raw data.fq 'genomeSize=300M' gnuplot=$(which gnuplot) \ java=$(which java)
The qsub command produced will look similar to:
qsub \ -l h_vmem=4g \ -pe smp 1 \ -l h_vmem=8G \ -pe smp 4 \ -cwd \ -N 'canu_Ppal' \ -j y \ -o /data/home/abc/canu/output/canu-scripts/canu.01.out \ /data/home/abc/canu/output/canu-scripts/canu.01.sh
Duplicate scheduler variables
The duplication of
smp scheduler variables in the example
above can be ignored because the latter variable overrides the former.
The qsub command will automatically be executed, so a new Apocrita job will
be launched. The output of the second job is written to:
/data/home/abc/canu/output/canu-scripts/canu.01.out which is symlinked
in the parent directory.