TensorFlow¶
TensorFlow is an open source library for machine learning.
Versions¶
CPU and GPU versions of the TensorFlow python library are available on Apocrita, and require different methods to install.
GPU version is recommended
TensorFlow programs typically run much faster on a GPU. Researchers need to request permission to be added to the list of GPU node users.
An more in-depth tutorial on installing and using TensorFlow on Apocrita is also available on our blog.
GPU version¶
The GPU version of TensorFlow can be installed as a python package, if the package was built against a CUDA /CUDNN library version that is supported on Apocrita.
If you notice several missing NVIDIA TensorRT
library errors in your TensorFlow output and you would like to benefit from
optimised inference within deep learning frameworks, load the TensorRT
module
before running TensorFlow to import the SDK.
Installing with pip¶
The tensorflow-gpu package may be installed using pip in a virtualenv, which uses packages from the Python Package Index. Versions up to 1.12 were built with CUDA 9.0, while version 1.13 and later are built with CUDA 10.0
Loading a CUDNN module will also load the corresponding CUDA module as a prerequisite. These libraries are required to be loaded for the tensorflow-gpu package to work. Make sure to check for any errors in the output from TensorFlow, as an incorrect CUDA or CUDNN module version will usually result in the GPU not being used.
Loading a TensorRT module will also load the corresponding CUDNN module (and therefore CUDA) as a prerequisite.
Initial setup:
module load python
virtualenv tensorgpu
source tensorgpu/bin/activate
pip install tensorflow-gpu
Installing specific versions of TensorFlow
To select a specific version, use the pip
standard method, noting that
other versions may have been built with different CUDA libraries. To
install version 2.0.0, run pip install tensorflow-gpu==2.0.0
. Removing the
version number installs the latest release version.
If you have any other additional python package dependencies, these should be
installed into your virtualenv with additional pip install
commands, or in
bulk, using a
requirements file
Subsequent activation as part of a GPU job:
module load python
module load cudnn/7.6.5-cuda-10.0
source tensorgpu/bin/activate
Installing with conda¶
If you prefer to use conda environments, the approach is slightly different as conda supports a variety of CUDA versions and will install requirements as conda packages within your virtual environment. Note that while the pip packages are officially supported by TensorFlow, the conda packages are built and supported by Anaconda.
Conda package availability and disk space
Conda tends to pull in a lot of packages, consuming more space than pip virtualenvs. Additionally, pip tends to have a wider range of third-party packages than conda.
Initial setup:
module load anaconda3
conda create -n tensorgpu
conda activate tensorgpu
conda install tensorflow-gpu
Subsequent activation as part of a GPU job:
module load anaconda3
conda activate tensorgpu
CPU version¶
The CPU-only version of TensorFlow can be installed using pip in a virtualenv.
module load python
virtualenv tensorcpu
source tensorcpu/bin/activate
pip install tensorflow
If you require a specific TensorFlow version for compatibility reasons, you may
optionally specify the version to install - for example:
pip install tensorflow==1.12
Using containers¶
If you have certain requirements that are not satisfiable by pip or conda (e.g.
extra operating system packages not available on Apocrita), then it may be
possible to solve this with a Singularity container. For
most requirements, the pip
method is recommended, since it is easier to
maintain and add packages to a user-controlled virtualenv.
A list of existing TensorFlow containers can be found in the
/data/containers/tensorflow
directory on Apocrita, which can be customised to
add the required packages.
Example jobs¶
Simple GPU job using virtualenv¶
This assumes an existing virtualenv named tensorgpu
created as shown above.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load python
module load cudnn/7.6.5-cuda-10.0
source ~/tensorgpu/bin/activate
python -c 'import tensorflow as tf; print(tf.__version__)'
Simple GPU job using conda¶
This assumes an existing conda env named tensorgpu
created as shown above.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load anaconda3
conda activate tensorgpu
python -c 'import tensorflow as tf; print(tf.__version__)'
CPU-only example¶
The installation can be verified with a simple job:
#!/bin/bash
#$ -cwd
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load python
source ~/tensorcpu/bin/activate
python -c 'import tensorflow as tf; print(tf.__version__)'
Submit the script to the job scheduler and the TensorFlow version number will be recorded in the job output file.
Simple GPU job using a container¶
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load singularity
singularity exec --nv \
/data/containers/tensorflow/tensorflow-1.8-python3-ubuntu-16.04.img \
python -c 'import tensorflow as tf; print(tf.__version__)'
Singularity GPU support
The --nv
flag is required for GPU support and passes through the
appropriate GPU drivers and libraries from the host to the container.
GPU Machine learning example¶
This example makes use of the TensorFlow tutorial obtained by performing git
clone https://github.com/tensorflow/models
and uses 1 GPU on a node.
Using Multiple GPUs
Specifying 2 GPUs does not automatically mean that both will be used by the application. See the TensorFlow documentation for more information about constructing multi-GPU models.
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=240:0:0
#$ -l gpu=1
module load python
module load cudnn/7.6.5-cuda-10.0
source ~/tensorgpu/bin/activate
python ~/models/tutorials/image/mnist/convolutional.py
Checking that the GPU is being used correctly
Running ssh <nodename> nvidia-smi
will query the GPU status on a
node. You can find out the node your job is using with the qstat
command.