Skip to content

zstd

zstd is a fast multi-threaded compression tool.

zstd (pronounced zstandard) is available as a module on Apocrita.

Usage

$ module load zstd
$ zstd --help
Usage: zstd [args] [FILE(s)] [-o file]
FILE    : a filename
          with no FILE, or when FILE is - , read standard input
Arguments :
 -#     : # compression level (1-19, default: 3)
 -d     : decompression
 -D file: use `file` as Dictionary
 -o file: result stored into `file` (only if 1 input file)
 -f     : overwrite output without prompting and (de)compress links
--rm    : remove source file(s) after successful de/compression
 -T#    : spawns # compression threads (default: 1, 0==# cores)
--format=zstd : compress files to the .zst format (default)
--format=gzip : compress files to the .gz format
--format=xz : compress files to the .xz format
(output truncated, please check --help options for full list)

Example of compressing data with zstd on its fastest setting, with multiple cores:

zstd -T${NSLOTS} --fast data.tar -o data.tar.zstd

Core Utilisation

Use the -T${SLOTS} flag as described above to ensure you use the correct number of cores for your job. Requesting multiple cores for compression increases the performance significantly.

Example of decompressing data with zstd:

zstd -d data.tar.zstd -o data.tar

Decompression is single-threaded only

Decompression cannot be parallelised easily, so only a single core is required for decompression jobs

Example job

Serial job

Here is an example compression job running on 4 cores at compression level 10, removing the original data after successful compression:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load zstd
zstd -T${NSLOTS} -10 --rm data.tar -o data.tar.zstd

Here is an example single-core decompression job:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load zstd
zstd -d data.tar.gz -o data.tar

Since most compression tasks will take less than 1 hour, it is recommended to keep h_rt=1:0:0 which will allow access to additional nodes on the short queue.

References