Quotas¶

Quotas are used to keep track of storage space that is being used on per-user or per-group basis. Quotas are measured both in terms of number of files used and total size of files. File sizes are measured in terms of the number of blocks used.

Finding out current usage¶

When you are logged in to Apocrita, you can see which storage you have access to, and how much you have used, using the qmquota utility.

This will list your home directory, scratch storage and any shared project storage you have access to.

$ qmquota -s
Disk quotas for abc123
                           Space                               Files
Location                   Use%    Used    Free     Quota     Limit   Use%   Used  Free    Quota    Limit
/data/Shared-Lab-Storage   0.00%   32K     1.00T    1.00T     1.10T   -      3     -       -        -
/data/home/abc123          9.20%   9.20G   90.80G   100.00G   110.00G -      20667 -       -        -
/data/scratch/abc123       2.08%   64.00G  2.94T    3.00T     6.00T   0.00%  97    5499903 5000000  5500000

Several options are available to modify the output similar to those for the quota command:

$ qmquota -h
Usage: qmquota [OPTIONS]
-s, --human-readable : display numbers in human friendly units (MB, GB...)
-c, --colour         : Highlights storage which is over the quota
-h, --help           : This small usage guide
-A, --nfs-all        : display quota for all NFS mountpoints
-v, --verbose        : Include where no storage is allocated
-l, --local-only     : Do not query network filesystems

User Quotas and Fileset Quotas

Depending on where files are stored they are subject to one of two types of quota, User Quotas or Fileset Quotas.

Files in scratch and home are subject to User Quotas, which means the ownership of the file decides whose quota it affects.

Files in Lab / Project spaces are subject to a Fileset Quota and count against the quota for that share, regardless of owner.

You may wish to make use of a tool such as Dua to establish what exactly is using up space in your allocated storage.

Reporting¶

You will receive emails when you exceed scratch or home directory quotas. You will also receive emails if you are the designated group contact or data owner for shared storage.

Group Contacts

Group contacts are recorded in the README file in the top directory. If this needs to be updated, please contact us.

Good practice for managing your storage¶

Storage space is counted in fragments of 16KB, the smallest unit of allocation in GPFS. As a result, space usage may seem larger than the sum of files used.

We provide a database server running PostgreSQL and MariaDB for accessing/writing small amounts of data. This is more efficient than writing lots of small files.
Try to avoid having multiple processes writing to the same file. e.g. collecting logs from an application that's running on lots of nodes. Store each with a different name e.g. labelled by their process id (PID) or $SGE_TASK_ID.

Compressing files¶

Large directories of files can be compressed to reduce inode usage and the associated performance impacts. Large files can be compressed to reduce space usage.

If your data is stored in a compressed format you can de-compress the data within your job and then re-compress after use.

tar supports multiple compression algorithms which can be selected with one of the following options:

gzip with -z
bzip2 with -j
xz with -J

gzip is fastest and most suitable for compressing large directories with many files. This example compresses a directory with 1000 files into a single tar file using gzip compression. Using --remove-files removes the source files after they have been compressed.

$ ls -l example/ | wc -l
1000
$ tar --remove-files -czf example.tgz example/
$ ls
example.tgz

bzip2 is slower but provides better compression so is best used to compress large files. This example compresses a data file with bzip2

$ du -h example
11G     example
$ tar -cjf example.tar.bz2 example
$ ls
8.0K    example.tar.bz2

Example job using compression¶

Here is an example job using bzip2 compression on the data files which assumes the data files are modified. If the files are not modified, the final step of re-compressing the files is unnecessary.

The source tar file remains in its original location and does not need to be copied to the working directory.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

# Change into TMPDIR
cd ${TMPDIR}

# Extract data files
tar -xjf /data/home/abc123/example.tar.bz2

# Run job
examplebin --input-dir example/

# Compress data files (only needed if data is changed by job)
# and overwrite source tar file
tar -cjf /data/home/abc123/example.tar.bz2 example/