Skip to content

BLAST+

BLAST+ consists of a number of command-line applications for sequence database search, filtering and comparing tools, creating blast databases and masking low complexity sequence.

BLAST+ is available as a module on Apocrita.

Usage

To run the default installed version of BLAST+, simply load the blast+ module:

$ module load blast+
$ blastn -h
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
...

Please note that -db refers to a local database file. There are some common databases stored in /data/PublicDataSets/shared_dbs - you would need to refer to these by their full path, e.g.:

-db /data/PublicDataSets/shared_dbs/nt/YYYY-MM-DD/nt

Alternatively, you can use databases remotely using the -remote flag:

-remote
-db nt

which will output:

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.

Database: Nucleotide collection (nt)
           99,234,692 sequences; 1,340,601,026,309 total letters

Another option is to pull a database using the update_blastdb.pl script, which is added to your path when you load the blast+ module. You will also need the perl and perl5 modules loaded. This is best accomplished in a separate interactive qlogin session run in advance of your job script.

For example, to download and decompress the swissprot database:

$ module load blast+
$ module load perl perl5lib
$ update_blastdb.pl --decompress swissprot
Connected to NCBI
Downloading swissprot.tar.gz... [OK]
Decompressing swissprot.tar.gz ... [OK]

Once the database is downloaded and decompressed, then you will need to amend the -db argument in your job script to point directly at it, for example:

-db <path_to_db>/swissprot.fa

More information about the various ways to use the -db option can be found here.

Core usage and core limitation in a multithreaded run

  • To ensure that BLAST+ uses the correct number of cores, the -num_threads=${NSLOTS} option should be used.
  • BLAST+ supports a maximum of 32 cores.

A list of all BLAST+ applications and options can be found here.

Example job

Serial job

Here is an example job for BLAST+ running on 4 cores and 4GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load blast+

blastn -num_threads ${NSLOTS} \
       -out blast_serial.tsv \
       -outfmt 6 \
       -db /data/PublicDataSets/shared_dbs/nt/2023-02-21/nt \
       -query GRCh38_chr1_1percent.fasta

References