Using GPUs

The nxg nodes contain GPU cards that can provide huge acceleration for certain types of parallel computing tasks, via the CUDA and OpenCL frameworks.

Access to GPU nodes

Access to gpu nodes is available for free to QM Researchers. Please contact us if you would like to use these nodes so we can add you to the allowed user list and help you get started with your initial GPU job submission.

Applications with GPU support

srif.png

There is a considerable number of scientific and analytical applications with GPU support. While some have GPU support out of the box, such as Matlab and Ansys, others may require specific GPU-ready builds. These may appear in the module avail list with a -gpu suffix. If you require a gpu support adding to a specific application, please submit a request for a GPU build and provide some test data.

Be aware that not every GPU-capable application will run faster on a GPU for your code. For example, CP2K only has a GPU port of the DBCSR sparse matrix library. If you are not using this library in your code then you will not experience a performance boost.

Submitting jobs to GPU nodes

To request a gpu the -l gpu=<count> option should be used in your job submission, and the scheduler will automatically select a GPU node. Note that requests are handled per node, so a request for 64 cores and 2 gpus will result in 4 gpus across two nodes. Examples are shown below.

GPU Card Allocation

Ensure you set card allocation

Failure to set card allocation may result in contention with other users jobs and result in your job being killed.

Requesting cards with parallel PE

If using the parallel parallel environment requests will be exclusive, please ensure that you correctly set slots and gpu to fill the node.

Once a job starts, the assigned gpu cards are listed in the SGE_HGR_gpu environment variable as a space separated list. To ensure correct use of allocated gpu cards you need to limit your computation to run only on the allocated cards.

For Cuda this can be done by exporting the CUDA_VISIBLE_DEVICES environment variable which should be a comma separated list:

$ echo $SGE_HGR_gpu
0 1
# Set CUDA_VISIBLE_DEVICES,
# this converts the space separated list into a comma separated list
$ export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}

For OpenCL, this can be done via the GPU_DEVICE_ORDINAL environment variable which should be a comma separated list:

$ echo $SGE_HGR_gpu
0 1
# Set GPU_DEVICE_ORDINAL,
# this converts the space separated list into a comma separated list
$ export GPU_DEVICE_ORDINAL=${SGE_HGR_gpu// /,}

Example job submissions

Request one gpu (Cuda)

#$ -pe smp 16       # 16 cores (32 per nxg node)
#$ -l gpu=1         # request 1 gpu per host (2 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime

export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
./run_code.sh

Request two gpus on the same box (OpenCL)

#$ -pe smp 32       # 32 cores (32 per nxg node)
#$ -l gpu=2         # request 2 gpu per host (2 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime

export GPU_DEVICE_ORDINAL=${SGE_HGR_gpu// /,}
./run_code.sh

Request four gpus across multiple boxes (Cuda)

#$ -pe parallel 64  # 64 cores (32 per nxg node)
#$ -l gpu=2         # request 2 gpu per host (2 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime

export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
./run_code.sh