Skip to content

Jellyfish

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. Jellyfish can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

Jellyfish is available as a module on Apocrita.

Usage

Number of threads

By default, Jellyfish will run in single threaded mode. If you are requesting multiple cores, pass the -t parameter with the value of ${NSLOTS} as shown in the example job below.

To run the default installed version of Jellyfish, simply load the jellyfish module:

$ module load jellyfish
$ jellyfish --help
Usage: jellyfish <cmd> [options] arg...
Where <cmd> is one of: count, bc, info, stats, histo, dump, merge, query, cite, mem, jf.
Options:
  --version  Display version
  --help     Display this message

For usage documentation, pass the --help switch after any of the jellyfish command:

$ jellyfish count --help
Usage: jellyfish count [options] file:path+

Count k-mers in fasta or fastq files

Options (default value in (), *required):
 -m, --mer-len=uint32        *Length of mer
 -s, --size=uint64           *Initial hash size
 -t, --threads=uint32        Number of threads (1)

(output omitted)

Example job

Serial job

Here is an example job running on 2 core and 2GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load jellyfish

jellyfish count -t ${NSLOTS} \
                -m 16 \
                -s 10M \
                -C \
                -o 16mer.jf < ERR458495.fastq

References