Andrena cluster¶

The Andrena cluster is a set of GPU nodes which were purchased with a Research Capital Investment Fund to support the University's Digital Environment Research Institute.

Hardware¶

The cluster comprises 16 GPU nodes - each with 4 GPUs, providing a total of 64 Nvidia A100 GPUs. The Andrena nodes are joined to Apocrita and make use of the same job scheduler and high performance networking/storage.

DERI research groups may additionally make use of a portion of the 50TB DERI storage entitlement, while commonly used read-only datasets (e.g. training datasets for machine learning) can be hosted on high performance SSD storage.

Requesting access¶

To request access to the Andrena computational resources or storage, please contact us to discuss requirements.

Logging in to Andrena¶

The connection procedure is the same as Apocrita's login procedure. Please refer to the documentation below for information about how to submit jobs specifically to the Andrena cluster nodes.

Running jobs on Andrena¶

Workloads are submitted using the job scheduler and work exactly the same way as Apocrita, which is documented thoroughly on this site. If you have been approved to use Andrena, jobs can be submitted using the following additional request in the resource request section of the job script:

#$ -l cluster=andrena

Without this setting, the scheduler will try to run the job either on Apocrita or Andrena nodes, depending on availability.

Requesting 12 cores per GPU

By default, Andrena will only accept jobs requesting 8 cores per GPU. To request 12 cores per GPU, ensure the -l cluster=andrena parameter is requested, otherwise your job will be rejected upon submission.

Submitted jobs should follow a similar template to Apocrita GPU jobs, if requesting 8 cores per GPU (default). If using the -l cluster=andrena option to request 12 cores per GPU, the memory request must be reduced to 7.5G RAM per GPU. otherwise your job will be rejected upon submission.

An example GPU job script using a conda environment might look like:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12         # 12 cores per GPU
#$ -l h_rt=240:0:0    # 240 hours runtime
#$ -l h_vmem=7.5G     # 7.5G RAM per core
#$ -l gpu=1           # request 1 GPU
#$ -l cluster=andrena # use the Andrena nodes and enable 12 cores per GPU

module load miniforge
mamba activate tensorflow_env
python train.py

A typical GPU job script using virtualenv will look similar. Some applications such as PyTorch are packaged with necessary GPU libraries built-in, therefore it is not required to load any additional modules for GPU support.

However, CUDA libraries are not always installed as part of a pip install, so it may be necessary to load the relevant cudnn module to make the CUDNN and CUDA libraries available in your virtual environment. Note that loading the cudnn module also loads a compatible cuda module.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12         # 12 cores per GPU
#$ -l h_rt=240:0:0    # 240 hours runtime
#$ -l h_vmem=7.5G     # 7.5G RAM per core
#$ -l gpu=1           # request 1 GPU
#$ -l cluster=andrena # use the Andrena nodes and enable 12 cores per GPU

module load python cudnn
source venv/bin/activate
python train.py