Kraken¶
Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
Kraken is available as a module on Apocrita.
Versions¶
The first version of Kraken used a large indexed and sorted list of kmer pairs as its database. Although it was fast at execution, it came with large memory requirements therefore, Kraken2 was created to provide a solution to address the memory issue.
Kraken2 differs from Kraken in several important ways:
- Only minimisers of the kmer pairs in the query sequences are used as database queries.
- Kraken2 uses a compact hash table that is a probabilistic data structure.
- Kraken2 has the ability to build a database from amino acid sequences and perform a translated search of the query sequences against that database.
- Kraken 2 utilises spaced seeds in the storage and querying of minimisers to improve classification accuracy.
- Kraken2 provides support for "special" databases that are not based on NCBI taxonomy. These are currently limited to three popular 16S databases.
See the manual linked below for more information.
Usage¶
To run the default installed version of Kraken, simply load the kraken
module:
module load kraken
For usage documentation, run kraken -h
or kraken2 -h
depending on which
version you have loaded.
Example jobs¶
Serial jobs¶
Here is a Kraken example job running on 1 core and 5GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=5G
module load kraken
kraken --db example_db \
--threads ${NSLOTS} \
input_file
Here is a Kraken2 example job running on 1 core and 5GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=5G
module load kraken
kraken2 --db example2_db \
--threads ${NSLOTS} \
input_file