Skip to content

TensorFlow

TensorFlow is an open source library for machine learning.

Versions

CPU and GPU versions of the TensorFlow python library are available on Apocrita, and require different methods to install.

GPU version is recommended

TensorFlow programs typically run much faster on a GPU. Researchers need to request permission to be added to the list of GPU node users.

An more in-depth tutorial on installing and using TensorFlow on Apocrita is also available on our blog.

GPU version

The GPU version of TensorFlow can be installed as a python package, if the package was built against a CUDA /CUDNN library version that is supported on Apocrita.

Installing with pip

The tensorflow-gpu package may be installed using pip in a virtualenv, which uses packages from the Python Package Index. Versions up to 1.12 are built with CUDA 9.0.

Support for TensorFlow 1.13

TensorFlow 1.13 or newer versions via pip are built with CUDA 10, which is not available on Apocrita currently, due to a problem relating to a required driver update. Until CUDA 10 is ready, please install version 1.12 or earlier.

Loading a CUDNN module will also load the corresponding CUDA module as a prerequisite. These libraries are required to be loaded for the tensorflow-gpu package to work.

Initial setup:

module load python
virtualenv --include-lib tensorgpu
source tensorgpu/bin/activate
pip install tensorflow-gpu==1.12

Installing specific versions of tensorflow

To select a specific version, use the pip standard method, noting that other versions may have been built with different CUDA libraries. To install version 1.11, run pip install tensorflow-gpu==1.11. Removing the version number installs the latest release version.

If you have any other additional python package dependencies, these should be installed into your virtualenv with additional pip install commands, or in bulk, using a requirements file

Subsequent activation as part of a GPU job:

module load python
module load cudnn/7.4-cuda-9.0
source tensorgpu/bin/activate

Installing with conda

If you prefer to use conda environments, the approach is slightly different as conda supports CUDA versions 8.0, 9,0 and 9.2 and will install requirements as conda packages within your virtual environment. Note that while the pip packages are officially supported by TensorFlow, the conda packages are built and supported by Anaconda.

Conda package availability and disk space

Conda tends to pull in a lot of packages, consuming more space than pip virtualenvs. Additionally, pip tends to have a wider range of third-party packages than conda.

Initial setup:

module load anaconda3
conda create -n tensorgpu
source activate tensorgpu
conda install tensorflow-gpu

Subsequent activation as part of a GPU job:

module load anaconda3
source activate tensorgpu

CPU version

The CPU-only version of TensorFlow can be installed using pip in a virtualenv.

module load python
virtualenv --include-lib tensorcpu
source tensorcpu/bin/activate
pip install tensorflow

You may optionally install a specific TensorFlow version, for example: pip install tensorflow==1.4.1

Using containers

If you have certain requirements that are not satisfiable by pip or conda (e.g. extra operating system packages not available on Apocrita), then it may be possible to solve this with a Singularity container. For most requirements, the pip method is recommended, since it is easier to maintain and add packages to a user-controlled virtualenv.

A list of existing TensorFlow containers can be found in the /data/containers/tensorflow directory on Apocrita, which can be customised to add the required packages.

Example jobs

Simple GPU job using virtualenv

This assumes an existing virtualenv named tensorgpu created as shown above.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=1:0:0
#$ -l gpu=1

# Assign the correct GPU card (see https://docs.hpc.qmul.ac.uk/using/usingGPU )
export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
module load python
module load cudnn/7.4-cuda-9.0
source ~/tensorgpu/bin/activate
python -c 'import tensorflow as tf; print(tf.__version__)'

Simple GPU job using conda

This assumes an existing conda env named tensorgpu created as shown above.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=1:0:0
#$ -l gpu=1

# Assign the correct GPU card (see https://docs.hpc.qmul.ac.uk/using/usingGPU )
export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
module load anaconda3
source activate tensorgpu
python -c 'import tensorflow as tf; print(tf.__version__)'

CPU-only example

The installation can be verified with a simple job:

#!/bin/bash
#$ -cwd
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load python
source ~/tensorcpu/bin/activate
python -c 'import tensorflow as tf; print(tf.__version__)'

Submit the script to the job scheduler and the TensorFlow version number will be recorded in the job output file.

Simple GPU job using a container

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=1:0:0
#$ -l gpu=1

# Assign the correct GPU card (see https://docs.hpc.qmul.ac.uk/using/usingGPU )
export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
module load singularity
singularity exec --nv \
/data/containers/tensorflow/tensorflow-1.8-python3-ubuntu-16.04.img \
python -c 'import tensorflow as tf; print(tf.__version__)'

Singularity GPU support

The --nv flag is required for GPU support and passes through the appropriate GPU drivers and libraries from the host to the container.

GPU Machine learning example

This example makes use of the TensorFlow tutorial obtained by performing git clone https://github.com/tensorflow/models and uses 1 GPU on a node.

Using Multiple GPUs

Specifying 2 GPUs does not automatically mean that both will be used by the application. See the TensorFlow documentation for more information about constructing multi-GPU models.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 8
#$ -l h_rt=1:0:0
#$ -l gpu=1

export CUDA_VISIBLE_DEVICES=${SGE_HGR_gpu// /,}
module load python
module load cudnn/7.4-cuda-9.0
source ~/tensorgpu/bin/activate
python ~/models/tutorials/image/mnist/convolutional.py

Checking that the GPU is being used correctly

Running ssh <nodename> nvidia-smi will query the GPU status on a node. You can find out the node your job is using with the qstat command.

References