Skip to content

Using GPUs

The nxg nodes contain GPU cards that can provide huge acceleration for certain types of parallel computing tasks, via the CUDA and OpenCL frameworks.

Access to GPU nodes

Access to gpu nodes is available for free to QM Researchers. Please contact us if you would like to use these nodes so we can add you to the allowed user list and help you get started with your initial GPU job submission.

Applications with GPU support

srif.png

There is a considerable number of scientific and analytical applications with GPU support. While some have GPU support out of the box, such as Matlab and Ansys, others may require specific GPU-ready builds. These may appear in the module avail list with a -gpu suffix. If you require a gpu support adding to a specific application, please submit a request for a GPU build and provide some test data.

Be aware that not every GPU-capable application will run faster on a GPU for your code. For example, CP2K only has a GPU port of the DBCSR sparse matrix library. If you are not using this library in your code then you will not experience a performance boost.

Submitting jobs to GPU nodes

To request a gpu the -l gpu=<count> option should be used in your job submission, and the scheduler will automatically select a GPU node. Note that requests are handled per node, so a request for 64 cores and 2 gpus will result in 4 gpus across two nodes. Examples are shown below.

GPU Card Allocation

Ensure you set card allocation

Failure to set card allocation may result in contention with other users jobs and result in your job being killed.

Requesting cards with parallel PE

If using the parallel parallel environment requests will be exclusive, please ensure that you correctly set slots and gpu to fill the node.

Once a job starts, the assigned gpu cards are listed in the SGE_HGR_gpu environment variable as a space separated list. To ensure correct use of allocated gpu cards you need to limit your computation to run only on the allocated cards.

For Cuda this can be done by exporting the CUDA_VISIBLE_DEVICES environment variable which should be a comma separated list:

$ echo $SGE_HGR_gpu
0 1
# Set CUDA_VISIBLE_DEVICES,
# this converts the space separated list into a comma separated list
$ export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}

For OpenCL, this can be done via the GPU_DEVICE_ORDINAL environment variable which should be a comma separated list:

$ echo $SGE_HGR_gpu
0 1
# Set GPU_DEVICE_ORDINAL,
# this converts the space separated list into a comma separated list
$ export GPU_DEVICE_ORDINAL=${SGE_HGR_gpu// /,}

Checking GPU usage

GPU usage can be checked with the nvidia-smi command e.g.:

$ nvidia-smi -l 1
Tue May  1 13:30:11 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:83:00.0     Off |                    0 |
| N/A   32C    P0   147W / 149W |   1211MiB / 11439MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:84:00.0     Off |                    0 |
| N/A   31C    P8    30W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     28939    C   ...pose/build/examples/openpose/openpose.bin  1207MiB |
+-----------------------------------------------------------------------------+

In this example we can see that the process is using GPU 0. We use the -l 1 option which tells nvidia-smi to repeatedly output the status.

Example job submissions

Request one gpu (Cuda)

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 16       # 16 cores (32 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime
#$ -l h_vmem=7.5G     # 7.5 * 16 = 120G
#$ -l gpu=1         # request 1 gpu per host (2 per nxg node)

export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
./run_code.sh

Half node requests

When using a single graphics card and therefore half of the node, you will need to request the appropriate slots and memory on the node.

Whilst the nodes have 256G memory a small amount of this is taken up by the operating system and filesystem caches. In order to fit two jobs onto the node, requests for memory must total less than 256G, a suggested memory value for a half node job using 16 cores is 7.5G per core as this adds up to 120G, just under half the node.

Request two gpus on the same box (OpenCL)

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 32       # 32 cores (32 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime
#$ -l gpu=2         # request 2 gpu per host (2 per nxg node)
#$ -l exclusive     # request exclusive access to the node

export GPU_DEVICE_ORDINAL=${SGE_HGR_gpu// /,}
./run_code.sh

Request four gpus across multiple boxes (Cuda)

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe parallel 64  # 64 cores (32 per nxg node)
#$ -l h_rt=10:0:0   # 10 hour runtime
#$ -l gpu=2         # request 2 gpu per host (2 per nxg node)

export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
./run_code.sh