Jellyfish¶
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. Jellyfish can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
Jellyfish is available as a module on Apocrita.
Usage¶
Number of threads
By default, Jellyfish will run in single threaded mode. If you are
requesting multiple cores, pass the -t
parameter with the value of
${NSLOTS}
as shown in the example job below.
To run the default installed version of Jellyfish, simply load the jellyfish
module:
$ module load jellyfish
$ jellyfish --help
Usage: jellyfish <cmd> [options] arg...
Where <cmd> is one of: count, bc, info, stats, histo, dump, merge, query, cite, mem, jf.
Options:
--version Display version
--help Display this message
For usage documentation, pass the --help
switch after any of the jellyfish
command:
$ jellyfish count --help
Usage: jellyfish count [options] file:path+
Count k-mers in fasta or fastq files
Options (default value in (), *required):
-m, --mer-len=uint32 *Length of mer
-s, --size=uint64 *Initial hash size
-t, --threads=uint32 Number of threads (1)
(output omitted)
Example job¶
Serial job¶
Here is an example job running on 2 core and 2GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load jellyfish
jellyfish count -t ${NSLOTS} \
-m 16 \
-s 10M \
-C \
-o 16mer.jf < ERR458495.fastq