The Andrena cluster is a set of compute and GPU nodes which were purchased with a Research Capital Investment Fund to support the University's Digital Environment Research Institute.
The cluster comprises 16 GPU nodes - each with 4 GPUs, providing a total of 64 Nvidia A100 GPUs - plus 36 compute nodes with the same specification as the Apocrita ddy nodes. The Andrena nodes are joined to Apocrita and make use of the same job scheduler and high performance networking/storage.
DERI research groups may additionally make use of a portion of the 50TB DERI storage entitlement, while commonly used read-only datasets (e.g. training datasets for machine learning) can be hosted on high performance SSD storage.
To request access to the Andrena computational resources or storage, please contact firstname.lastname@example.org to discuss requirements.
Logging in to Andrena¶
We provide dedicated login nodes for Andrena users. The connection procedure
is the same as for Apocrita login procedure, except
login.hpc.qmul.ac.uk should be substituted with
the Andrena login nodes.
Running jobs on Andrena¶
Workloads are submitted using the job scheduler and works exactly the same way as Apocrita, which is documented thoroughly on this site. If you have been approved to use Andrena, jobs can be submitted from either Andrena or Apocrita login nodes, using the following additional request in the resource request section of the job script:
#$ -l cluster=andrena
For example, the whole job script might look like:
#!/bin/bash #$ -cwd # Run the job in the current directory #$ -pe smp 1 # Request 1 core #$ -l h_rt=240:0:0 # Request 10 days maximum runtime #$ -l h_vmem=1G # Request 1GB RAM per core #$ -l cluster=andrena # Ensure that the job runs on Andrena nodes module load python python mycode.py
Without this setting, the scheduler will try to run the job either on Apocrita or Andrena nodes, depending on availability.
GPU jobs follow the similar template to Apocrita GPU jobs, and should request 8 cores per GPU, and 11G per core even if fewer cores are actually used by the code. By mandating these rules within the job scheduler logic, we avoid situations where GPUs cannot be requested because another job is using all of the cores on the node.
An example GPU job script might look like:
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 8 # 8 cores per GPU #$ -l h_rt=240:0:0 # 240 hours runtime #$ -l h_vmem=11G # 11G RAM per core #$ -l gpu=1 # request 1 GPU #$ -l cluster=andrena # use the Andrena nodes module load anaconda3 conda activate tensorflow-env python train.py