Skip to content

SRA Tools

The Sequence Read Archive (SRA) Toolkit is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

SRA Tools is available as a module on Apocrita.

Usage

To run the default installed version of SRA Tools, simply load the sratools module:

$ module load sratools
Usage:   <command> [options] [--help]

Setup of SRA Tools

When running SRA Tools for the first time it is necessary to run vdb-config --interactive and then press 'x' to save a basic configuration. This is not necessary for further usage.

The following commands are available:

fastq-dump      abi-dump
prefetch        illumina-dump
sam-dump        sff-dump
sra-pileup      sra-stat
vdb-config      vdb-dump
vdb-decrypt     vdb-encrypt
vdb-validate

Usage for each command can be shown by running the command with the --help option. For more detailed documentation, please check the link to the documentation page listed in the references below.

Example jobs

Here are some example jobs using the main tools, each running on 1 core and 2G of memory:

Fastq-dump

This will convert SRA data into FASTQ format.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Produces two (--split-files) fasta files (--fasta) with 60 bases
# per line ("60" included after -- fasta).
fastq-dump --split-files --fasta 60 input-file

Prefetch

Allows command line downloading of SRA, dbGaP and ADSP data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Sets the maximum download file size to 200GB and downloads the
# files listed in the kart.
prefetch -X 200G input-file.krt

Sam-dump

Converts SRA data to SAM format.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Produces gzip’d output file with a reconstructed header.
sam-dump -r --gzip --output-file output-file.sam.gz input-file

SRA-pileup

Generates pileup statistics on aligned SRA data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Produces pileup stats for a given genomic region (-r), chromosome 1,
# bases 559140-559160 (1:559140-559160).
sra-pileup -r 1:559140-559160 input-file

VDB-config

Displays and modifies VDB configuration information.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Imports a dbGaP repository key by command line.
vdb-config --import input-file.ngc

VDB-decrypt

Decrypts non-SRA dbGaP data.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load sratools

# Decrypt a single encrypted file that has been downloaded.
vdb-decrypt input-file

References