SRA Tools¶
The Sequence Read Archive (SRA) Toolkit is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
SRA Tools is available as a module on Apocrita.
Usage¶
To run the default installed version of SRA Tools, simply load the
sratools
module:
$ module load sratools
Usage: <command> [options] [--help]
Setup of SRA Tools
When running SRA Tools for the first time it is necessary to run
vdb-config --interactive
and then press 'x' to save a basic
configuration. This is not necessary for further usage.
The following commands are available:
fastq-dump abi-dump
prefetch illumina-dump
sam-dump sff-dump
sra-pileup sra-stat
vdb-config vdb-dump
vdb-decrypt vdb-encrypt
vdb-validate
Usage for each command can be shown by running the command with the --help
option. For more detailed documentation, please check the link to the
documentation page listed in the references below.
Example jobs¶
Here are some example jobs using the main tools, each running on 1 core and 2G of memory:
Fastq-dump¶
This will convert SRA data into FASTQ format.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Produces two (--split-files) fasta files (--fasta) with 60 bases
# per line ("60" included after -- fasta).
fastq-dump --split-files --fasta 60 input-file
Prefetch¶
Allows command line downloading of SRA, dbGaP and ADSP data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Sets the maximum download file size to 200GB and downloads the
# files listed in the kart.
prefetch -X 200G input-file.krt
Sam-dump¶
Converts SRA data to SAM format.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Produces gzip’d output file with a reconstructed header.
sam-dump -r --gzip --output-file output-file.sam.gz input-file
SRA-pileup¶
Generates pileup statistics on aligned SRA data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Produces pileup stats for a given genomic region (-r), chromosome 1,
# bases 559140-559160 (1:559140-559160).
sra-pileup -r 1:559140-559160 input-file
VDB-config¶
Displays and modifies VDB configuration information.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Imports a dbGaP repository key by command line.
vdb-config --import input-file.ngc
VDB-decrypt¶
Decrypts non-SRA dbGaP data.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load sratools
# Decrypt a single encrypted file that has been downloaded.
vdb-decrypt input-file