zstd¶
zstd is a fast multi-threaded compression tool.
zstd (pronounced zstandard) is available as a module on Apocrita.
Usage¶
$ module load zstd
$ zstd --help
Usage: zstd [args] [FILE(s)] [-o file]
FILE : a filename
with no FILE, or when FILE is - , read standard input
Arguments :
-# : # compression level (1-19, default: 3)
-d : decompression
-D file: use `file` as Dictionary
-o file: result stored into `file` (only if 1 input file)
-f : overwrite output without prompting and (de)compress links
--rm : remove source file(s) after successful de/compression
-T# : spawns # compression threads (default: 1, 0==# cores)
--format=zstd : compress files to the .zst format (default)
--format=gzip : compress files to the .gz format
--format=xz : compress files to the .xz format
(output truncated, please check --help options for full list)
Example of compressing data with zstd on its fastest setting, with multiple cores:
zstd -T${NSLOTS} --fast data.tar -o data.tar.zstd
Core Utilisation
Use the -T${SLOTS}
flag as described above to ensure you use the correct
number of cores for your job. Requesting multiple cores for compression
increases the performance significantly.
Example of decompressing data with zstd:
zstd -d data.tar.zstd -o data.tar
Decompression is single-threaded only
Decompression cannot be parallelised easily, so only a single core is required for decompression jobs
Example job¶
Serial job¶
Here is an example compression job running on 4 cores at compression level 10, removing the original data after successful compression:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load zstd
zstd -T${NSLOTS} -10 --rm data.tar -o data.tar.zstd
Here is an example single-core decompression job:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load zstd
zstd -d data.tar.gz -o data.tar
Since most compression tasks will take less than 1 hour, it is recommended to
keep h_rt=1:0:0
which will allow access to additional nodes on the short
queue.