GNU Parallel¶
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
GNU Parallel is available as a module on Apocrita.
Usage¶
To run the default installed version of GNU Parallel, simply load the parallel
module:
module load parallel
then run, for example a parallel gzip on multiple files:
parallel gzip ::: file1 file2 file3
Alternatively, the list of commands to run can be specified using an input file:
parallel ::: < list.sh
Within a script this can be done using a here document e.g.
parallel ::: <<ENDOFLIST
command1 -option1 file1
command2 -option2 file2
ENDOFLIST
Example jobs¶
Serial job¶
Here is an example job running on 4 cores.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load parallel
parallel gzip ::: file1 file2 file3 file4