GNU Parallel

GNU Parallel

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

GNU Parallel is available as a module on Apocrita.

Usage

To run the latest installed version of GNU Parallel, simply load the parallel module:

module load parallel

then run, for example a parallel gzip on multiple files:

parallel gzip ::: file1 file2 file3

Alternatively, the list of commands to run can be specified using an input file:

parallel ::: < list.sh

Within a script this can be done using a here document e.g.

parallel ::: <<ENDOFLIST
command1 -option1 file1
command2 -option2 file2
ENDOFLIST

Example jobs

Serial job

Here is an example job running on 4 cores.

#!/bin/sh
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=4:0:0
#$ -l h_vmem=1G

module load parallel

parallel gzip ::: file1 file2 file3 file4

References