PyTorch¶
PyTorch is an open source deep learning platform.
Versions¶
CPU and GPU versions of the PyTorch python library are available and require different methods to install.
GPU version is recommended
PyTorch typically runs much faster on a GPU. Researchers need to request permission to be added to the list of GPU node users.
It's worth visiting the PyTorch "Get Started" page, where you can find an interactive installation command generator.
GPU version¶
Installing with pip¶
PyTorch may be installed using pip in a virtualenv, which uses packages from the Python Package Index. The PyTorch binaries are packaged with necessary libraries built-in, therefore it is not required to load CUDA/CUDNN modules.
Initial setup:
module load python
virtualenv pytorchenv
source pytorchenv/bin/activate
pip install torch torchvision torchaudio
Installing specific versions of PyTorch
To select a specific version, use the pip
standard method, for example, to
install version 1.0.0, run pip install torch==1.0.0
. Removing the
version number installs the latest release version.
If you have any other additional python package dependencies, these should be
installed into your virtualenv with additional pip install
commands, or
preferably in bulk using a
requirements file
Subsequent activation as part of a GPU job:
module load python
source pytorchenv/bin/activate
Installing with conda¶
If you prefer to use conda environments, instructions are provided below. However, for simplicity the examples on this page will use pip.
Conda package availability and disk space
Conda tends to pull in a lot of packages, consuming more space than pip virtualenvs. Additionally, pip tends to have a wider range of third-party packages than conda.
Initial setup:
module load anaconda3
mamba create -n pytorchenv
mamba activate pytorchenv
mamba install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Subsequent activation as part of a GPU job:
module load anaconda3
mamba activate pytorchenv
CPU-only version¶
The CPU version will be slower, but perhaps useful for quick prototyping, and creates a much smaller virtual environment. CPU-only code should not be run on the GPU nodes.
Pip instructions¶
To install the cpu-only version, create the virtualenv as shown in the GPU version above, then run the following commands:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Conda instructions¶
To install the cpu-only version, create the conda environment as shown in the GPU version above, then run the following command:
mamba install pytorch torchvision torchaudio cpuonly -c pytorch
Example jobs¶
GPU basic example¶
The job script assumes a virtual environment pytorchenv
containing the
pytorch GPU packages, set up as shown above.
#!/bin/bash
#$ -cwd
#$ -pe smp 12
#$ -l h_vmem=7.5G
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load python
source ~/pytorchenv/bin/activate
python two_layer_net_tensor_gpu.py
A copy of the example PyTorch script can be obtained by running
wget https://raw.githubusercontent.com/sbutcher/pytorch-examples/master/tensor/two_layer_net_tensor_gpu.py
Submit the script to the job scheduler.
GPU training example¶
This example makes use of the PyTorch transfer learning
tutorial
which utilises a single GPU. The following steps will set up the environment to
use with an existing virtual environment named pytorchenv
, with PyTorch and
matplotlib packages installed:
wget https://pytorch.org/tutorials/_downloads/07d5af1ef41e43c07f848afaf5a1c3cc/transfer_learning_tutorial.py
wget https://download.pytorch.org/tutorial/hymenoptera_data.zip
mkdir data
unzip ../hymenoptera_data.zip -d data
Create a job script using
this GPU job template
and submit with the qsub
command:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12
#$ -l h_vmem=7.5G
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load python
source ~/pytorchenv/bin/activate
python transfer_learning_tutorial.py
Checking that the GPU is being used correctly
Running ssh <nodename> nvidia-smi
on a node will query the GPU status.
You can also use the
nvtools module
to check that the GPU is being used correctly. If the job is running, the
qstat
command will show which node is being used.
It is possible to write PyTorch code for multiple GPUs, and also hybrid CPU/GPU tasks, but do not request more than one GPU unless you can verify that multiple GPU are correctly utilised by your code.
CPU-only example¶
The job script assumes a virtual environment pytorchcpu
containing the
cpu-only pytorch packages, set up as shown above.
#!/bin/bash
#$ -cwd
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load python
source ~/pytorchcpu/bin/activate
python two_layer_net_tensor_cpu.py
A copy of the example PyTorch script can be obtained by running
wget https://raw.githubusercontent.com/sbutcher/pytorch-examples/master/tensor/two_layer_net_tensor_cpu.py
Submit the script to the job scheduler.