Skip to content

Using GPUs

The nxg and sbg nodes contain GPU cards that can provide huge acceleration for certain types of parallel computing tasks, via the CUDA and OpenCL frameworks.

Access to GPU nodes

Access to GPU nodes is available for free to QM Researchers. Please contact us if you would like to use these nodes so we can add you to the allowed user list and help you get started with your initial GPU job submission. Note that access to GPU nodes is not permitted for Undergraduate and MSc students, or external users.

Short queue GPU testing

QM Researchers may consider testing their GPU code before requesting GPU access. We provide a short queue testing node "dn150" which supports GPU jobs up to 1 hour - see below for more information and examples.

Applications with GPU support

srif.png

There is a considerable number of scientific and analytical applications with GPU support. While some have GPU support out of the box, such as Matlab and Ansys, others may require specific GPU-ready builds. These may appear in the module avail list with a -gpu suffix. If you require GPU support adding to a specific application, please submit a request for a GPU build and provide some test data.

Be aware that not every GPU-capable application will run faster on a GPU for your code. For example, CP2K only has a GPU port of the DBCSR sparse matrix library. If you are not using this library in your code then you will not experience a performance boost.

Submitting jobs to GPU nodes

To request a GPU, the -l gpu=<count> option should be used in your job submission. The scheduler will automatically select a GPU node based on availability and other resource parameters, if specified.

We also provide a node (dn150) with 2 Kepler K40c GPU cards which may be used before acquiring access to run on the other GPU nodes. This node has been designed to avoid long queuing times during GPU code testing and therefore, only accepts jobs submitted with up to 1 hour runtime, on the short queue.

Selecting a specific GPU type

For compatibility, you may optionally require a specific GPU type. For example, CUDA version 8 predates the V100 GPU, and is not supported, so -l gpu_type=kepler would select nodes using the K80 GPU instead. Conversely, nodes with the Volta V100 GPU may be selected with -l gpu_type=volta, and Ampere A100 nodes may be selected with -l gpu_type=ampere.

GPU card allocation

Do not set the CUDA_VISIBLE_DEVICES variable

For reasons documented below, please do not set the CUDA_VISIBLE_DEVICES variable in your job scripts.

We have enabled GPU device cgroups (Linux Control Groups) across all GPU nodes on Apocrita, which means your job will only be presented the gpu cards which have been allocated by the scheduler, to prevent some applications from attaching to GPUs which have not been allocated to the job.

Previously, it was required to set the CUDA_VISIBLE_DEVICES variable in job scripts to ensure the correct GPU is used in the job. However, this was a workaround until the GPU device cgroups were applied. You should no longer set this variable in your job scripts.

Inside your job, the GPU cards presented to your job will always appear as device 0 to device N, depending on how many GPU cards you have requested. Below demonstrates the devices presented to jobs, per GPU resources request:

GPUs Requested Devices Presented
1 0
2 0, 1
3 0 - 2
4 0 - 3

Checking GPU usage

GPU usage can be checked with the nvidia-smi command e.g.:

$ nvidia-smi -l 1
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   70C    P0   223W / 250W |  12921MiB / 16160MiB |     97%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000000:2F:00.0 Off |                    0 |
| N/A   30C    P0    23W / 250W |      4MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  On   | 00000000:86:00.0 Off |                    0 |
| N/A   30C    P0    23W / 250W |      6MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  On   | 00000000:D8:00.0 Off |                    0 |
| N/A   31C    P0    23W / 250W |      6MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1557      C   python                          12915MiB |
+-----------------------------------------------------------------------------+

In this example we can see that the process is using GPU 0. We use the -l 1 option which tells nvidia-smi to repeatedly output the status. Should this be run inside a job, GPU 0 would be the card you have been allocated, which might not be system device 0.

If you SSH into a GPU node and run nvidia-smi, you will see all system GPU devices by their real ID, rather than the allocated device number. Similarly, the SGE_HGR_gpu environment variable inside jobs and the qstat -j JOB_ID command will also show the actual GPU device granted.

Example job submissions

The following examples show the basic outline of job scripts for GPU nodes. Note that while the general rule for compute nodes is to strictly request only the cores and RAM you will be using, our GPU jobs are GPU-centric: request only the GPUs you will be using, but select 8 cores per GPU, and 7.5GB per core. You can increase this to 11GB if you are sure that your job will not use an nxg node (e.g. by using -l gpu_type='volta|ampere'), as these have less RAM than sbg nodes. More detailed examples can also be found on the application-specific pages on this site (e.g. TensorFlow)

h_vmem does not need to account for GPU RAM

The h_vmem request only refers to the system RAM, and does not need to account for GPU RAM used. The full GPU RAM is automatically granted when you request a GPU

Testing GPU Node (dn150)

Memory requests during testing

The GPU testing node "dn150" only has a small amount of RAM. Therefore, jobs must request a maximum of 1G per core otherwise they will not be eligible to run on dn150. Jobs without the correct memory setting may be subject to increased job queuing times and/or produce undesirable results.

Request one dn150 GPU

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8        # 8 cores (16 available on dn150)
#$ -l h_rt=1:0:0    # 1 hour runtime (required to run on dn150)
#$ -l h_vmem=1G     # see above notice
#$ -l gpu=1         # request 1 GPU

./run_code.sh

Request two dn150 GPUs (whole node)

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 16       # 16 cores (16 available on dn150)
#$ -l h_rt=1:0:0    # 1 hour runtime (required to run on dn150)
#$ -l gpu=2         # request 2 GPUs
#$ -l exclusive     # request exclusive access


./run_code.sh

Production GPU Nodes

Request one GPU

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8        # 8 cores (32 available per GPU node)
#$ -l h_rt=240:0:0  # 240 hours runtime
#$ -l h_vmem=7.5G   # 7.5 * 8 = 60G
#$ -l gpu=1         # request 1 GPU

./run_code.sh

Request two GPUs

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 16       # 16 cores (32 available per GPU node)
#$ -l h_rt=240:0:0  # 240 hours runtime
#$ -l h_vmem=7.5G   # 7.5 * 16 = 120G
#$ -l gpu=2         # request 2 GPUs

./run_code.sh

Request four GPUs (whole node)

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 32       # 32 cores (32 available per GPU node)
#$ -l h_rt=240:0:0  # 240 hours runtime
#$ -l gpu=4         # request 4 GPUs
#$ -l exclusive     # request exclusive access

./run_code.sh

If you are requesting all GPUs on a node, then choosing exclusive mode will give you access to all of the resources. Note that requesting a whole node will likely result in a long queueing time (except dn150), unless you have access to an "owned" GPU node that your research group has purchased.

Submitting jobs to an owned node

If your research group has purchased a GPU node, the scheduler default action will be to firstly check for available slots on owned node(s), and then the public GPU nodes (if applicable). If you want to restrict your job to your owned nodes only (e.g. for performance, or to ensure consistency), then adding:

#$ -l owned

to the resource request section at the top of your job script will restrict the job to running on owned nodes only.

Getting help

If you are unsure about how to configure your GPU jobs, please contact us for assistance.