Skip to content

Pigz

pigz is a parallel implementation of gzip for modern multi-processor, multi-core machines.

pigz is available as a module on Apocrita.

Usage

$ module load pigz
$ pigz --help
Usage: pigz [options] [files ...]
  will compress files in place, adding the suffix '.gz'. If no files are
  specified, stdin will be compressed to stdout. pigz does what gzip does,
  but spreads the work over multiple processors and cores when compressing...

Example of compressing data with pigz:

pigz -p ${NSLOTS} data.tar

Core Utilisation

Use the -p ${NSLOTS} flag as described above to ensure you use the correct number of cores for your job.

Example of decompressing data with pigz:

unpigz -p ${NSLOTS} data.tar.gz

Decompression is single-threaded only

Decompression cannot be parallelised easily, so pigz uses a single thread for decompression, plus three other threads for reading, writing and checksum calculation, which can speed up decompression under some circumstances. Any additional cores allocated to decompression will be wasted.

Example job

Serial job

Here is an example job running on 4 cores:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load pigz
pigz -p ${NSLOTS} data.tar

References

List of websites for project page, manuals, tutorials etc.