Quotas¶
Quotas are used to keep track of storage space that is being used on per-user or per-group basis. Quotas are measured both in terms of number of files used and total size of files. File sizes are measured in terms of the number of blocks used.
Finding out current usage¶
When you are logged in to Apocrita, you can see which storage
you have access to, and how much you have used, using the qmquota
utility.
This will list your home directory, scratch storage and any shared project storage you have access to.
$ qmquota -s
Disk quotas for abc123
Space Files
Location Use% Used Free Quota Limit Use% Used Free Quota Limit
/data/Shared-Lab-Storage 0.00% 32K 1.00T 1.00T 1.10T - 3 - - -
/data/home/abc123 9.20% 9.20G 90.80G 100.00G 110.00G - 20667 - - -
/data/scratch/abc123 2.08% 64.00G 2.94T 3.00T 6.00T 0.00% 97 5499903 5000000 5500000
Several options are available to modify the output similar to those for the
quota
command:
$ qmquota -h
Usage: qmquota [OPTIONS]
-s, --human-readable : display numbers in human friendly units (MB, GB...)
-c, --colour : Highlights storage which is over the quota
-h, --help : This small usage guide
-A, --nfs-all : display quota for all NFS mountpoints
-v, --verbose : Include where no storage is allocated
-l, --local-only : Do not query network filesystems
User Quotas and Fileset Quotas
Depending on where files are stored they are subject to one of two types of quota, User Quotas or Fileset Quotas.
Files in scratch
and home
are subject to User Quotas,
which means the ownership of the file decides whose quota it affects.
Files in Lab / Project spaces are subject to a Fileset Quota and count against the quota for that share, regardless of owner.
You may wish to make use of a tool such as Dua to establish what exactly is using up space in your allocated storage.
Reporting¶
You will receive emails when you exceed scratch or home directory quotas. You will also receive emails if you are the designated group contact or data owner for shared storage.
Group Contacts
Group contacts are recorded in the README
file in the top directory. If
this needs to be updated, please contact us.
Good practice for managing your storage¶
Storage space is counted in fragments of 16KB, the smallest unit of allocation in GPFS. As a result, space usage may seem larger than the sum of files used.
- We provide a database server running PostgreSQL and MariaDB for accessing/writing small amounts of data. This is more efficient than writing lots of small files.
- Try to avoid having multiple processes writing to the same file. e.g.
collecting logs from an application that's running on lots of nodes.
Store each with a different name e.g. labelled by their process id (PID)
or
$SGE_TASK_ID
.
Compressing files¶
Large directories of files can be compressed to reduce inode usage and the associated performance impacts. Large files can be compressed to reduce space usage.
If your data is stored in a compressed format you can de-compress the data within your job and then re-compress after use.
tar
supports multiple compression algorithms which can be selected with one of
the following options:
gzip
is fastest and most suitable for compressing large directories with many
files. This example compresses a directory with 1000 files into a single tar
file using gzip
compression. Using --remove-files
removes the source files
after they have been compressed.
$ ls -l example/ | wc -l
1000
$ tar --remove-files -czf example.tgz example/
$ ls
example.tgz
bzip2
is slower but provides better compression so is best used to
compress large files. This example compresses a data file with bzip2
$ du -h example
11G example
$ tar -cjf example.tar.bz2 example
$ ls
8.0K example.tar.bz2
Example job using compression¶
Here is an example job using bzip2
compression on the data files which
assumes the data files are modified. If the files are not modified, the final
step of re-compressing the files is unnecessary.
The source tar file remains in its original location and does not need to be copied to the working directory.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
# Change into TMPDIR
cd ${TMPDIR}
# Extract data files
tar -xjf /data/home/abc123/example.tar.bz2
# Run job
examplebin --input-dir example/
# Compress data files (only needed if data is changed by job)
# and overwrite source tar file
tar -cjf /data/home/abc123/example.tar.bz2 example/