BLAST+¶
BLAST+ consists of a number of command-line applications for sequence database search, filtering and comparing tools, creating blast databases and masking low complexity sequence.
BLAST+ is available as a module on Apocrita.
Usage¶
To run the default installed version of BLAST+, simply load the blast+
module:
$ module load blast+
$ blastn -h
USAGE
blastn [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
...
Please note that -db
refers to a local database file. There are some common
databases stored in /data/PublicDataSets/shared_dbs
- you would need to refer
to these by their full path, e.g.:
-db /data/PublicDataSets/shared_dbs/nt/YYYY-MM-DD/nt
Alternatively, you can use databases remotely using the -remote
flag:
-remote
-db nt
which will output:
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: Nucleotide collection (nt)
99,234,692 sequences; 1,340,601,026,309 total letters
Another option is to pull a database using the update_blastdb.pl
script,
which is added to your path when you load the blast+
module. You will also
need the perl
and perl5
modules loaded. This is best accomplished in a
separate interactive qlogin session run in advance
of your job script.
For example, to download and
decompress the swissprot
database:
$ module load blast+
$ module load perl perl5lib
$ update_blastdb.pl --decompress swissprot
Connected to NCBI
Downloading swissprot.tar.gz... [OK]
Decompressing swissprot.tar.gz ... [OK]
Once the database is downloaded and decompressed, then you will need to amend
the -db
argument in your job script to point directly at it, for example:
-db <path_to_db>/swissprot.fa
More information about the various ways to use the -db
option can be found
here.
Core usage and core limitation in a multithreaded run
- To ensure that BLAST+ uses the correct number of cores, the
-num_threads=${NSLOTS}
option should be used. - BLAST+ supports a maximum of 32 cores.
A list of all BLAST+ applications and options can be found here.
Example job¶
Serial job¶
Here is an example job for BLAST+ running on 4 cores and 4GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load blast+
blastn -num_threads ${NSLOTS} \
-out blast_serial.tsv \
-outfmt 6 \
-db /data/PublicDataSets/shared_dbs/nt/2023-02-21/nt \
-query GRCh38_chr1_1percent.fasta