GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
GNU Parallel is available as a module on Apocrita.
To run the latest installed version of GNU Parallel, simply load the
module load parallel
then run, for example a parallel gzip on multiple files:
parallel gzip ::: file1 file2 file3
Alternatively, the list of commands to run can be specified using an input file:
parallel ::: < list.sh
Within a script this can be done using a here document e.g.
parallel ::: <<ENDOFLIST command1 -option1 file1 command2 -option2 file2 ENDOFLIST
Here is an example job running on 4 cores.
#!/bin/sh #$ -cwd #$ -j y #$ -pe smp 4 #$ -l h_rt=4:0:0 #$ -l h_vmem=1G module load parallel parallel gzip ::: file1 file2 file3 file4