Seqtk¶
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
Seqtk is available as a module on Apocrita.
Usage¶
To run the default installed version of Seqtk, simply load the seqtk
module:
$ module load seqtk
$ seqtk
Usage: seqtk <command> <arguments>
Version: <VERSION>
Command: seq common transformation of FASTA/Q
comp get the nucleotide composition of FASTA/Q
sample subsample sequences
subseq extract subsequences from FASTA/Q
fqchk fastq QC (base/quality summary)
mergepe interleave two PE FASTA/Q files
trimfq trim FASTQ using the Phred algorithm
hety regional heterozygosity
gc identify high- or low-GC regions
mutfa point mutate FASTA at specified positions
mergefa merge two FASTA/Q files
famask apply a X-coded FASTA to a source FASTA
dropse drop unpaired from interleaved PE FASTA/Q
rename rename sequence names
randbase choose a random base from hets
cutN cut sequence at long N
listhet extract the position of each het
Example jobs¶
Serial jobs¶
Here is an example job running on 1 core and 2GB of memory to convert FASTQ to FASTA:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load seqtk
# Convert FASTQ to FASTA
seqtk seq fastq_data.fastq.gz > fasta_data.fa
Here is an example job running on 1 core and 2GB of memory to extract sequences:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load seqtk
# Extract sequences in regions contained in file reg.bed
seqtk subseq fastq_data.fastq.gz reg.bed