Skip to content

Kraken

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.

Kraken is available as a module on Apocrita.

Versions

The first version of Kraken used a large indexed and sorted list of kmer pairs as its database. Although it was fast at execution, it came with large memory requirements therefore, Kraken2 was created to provide a solution to address the memory issue.

Kraken2 differs from Kraken in several important ways:

  • Only minimisers of the kmer pairs in the query sequences are used as database queries.
  • Kraken2 uses a compact hash table that is a probabilistic data structure.
  • Kraken2 has the ability to build a database from amino acid sequences and perform a translated search of the query sequences against that database.
  • Kraken 2 utilises spaced seeds in the storage and querying of minimisers to improve classification accuracy.
  • Kraken2 provides support for "special" databases that are not based on NCBI taxonomy. These are currently limited to three popular 16S databases.

See the manual linked below for more information.

Usage

To run the default installed version of Kraken, simply load the kraken module:

module load kraken

For usage documentation, run kraken -h or kraken2 -h depending on which version you have loaded.

Example jobs

Serial jobs

Here is a Kraken example job running on 1 core and 5GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=5G

module load kraken

kraken --db example_db \
       --threads ${NSLOTS} \
       input_file

Here is a Kraken2 example job running on 1 core and 5GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=5G

module load kraken

kraken2 --db example2_db \
        --threads ${NSLOTS} \
        input_file

References