Python¶

Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++.

Python 2 is not supported on Apocrita

Python 2 has not been supported by the Python Software Foundation since January 2020 therefore, we are unable to provide support for any project requiring Python 2.

Python is available as a module on Apocrita.

Usage¶

Python distribution module file conflicts

To prevent errors when running the Python interpreter, attempting to load an additional Python distribution module after loading a Python module will produce a module load conflict error.

To run the default installed version of Python, simply load the Python module:

module load python

then run Python either interactively by executing python, or more commonly with a script file (i.e. example.py), containing Python code:

python example.py

Installing Python packages¶

Do not install Pip packages locally

To avoid package conflicts and dependency issues when running Python, do not install Pip packages locally (i.e. by using the --user flag to pip install). The correct approach is to use a virtualenv for each project, containing only the required Python packages for that project.

You can use virtualenv to create a virtual Python environment containing all the required packages for your project. Once the virtual environment has been created, you can use the familiar pip command to install packages once the virtual environment has been activated.

The below full worked example shows how to create and activate a virtualenv, and install Numpy using pip.

# virtualenv is installed as part of the python module
$ module load python

# Set up a virtual environment called numpy_env
$ virtualenv numpy_env

# Activate the "numpy_env" virtual environment
$ source numpy_env/bin/activate

# Install the Numpy library using pip
(numpy_env)$ pip install numpy

Once all the packages have been installed inside the virtual environment (in this case we are only installing Numpy), you can proceed to run code which requires the package(s) installed within the virtualenv.

The below example shows how to import the Numpy library installed within the previously created virtualenv:

# Load the same Python module used to create the virtual environment
$ module load python

# Activate the "numpy_env" virtual environment
$ source numpy_env/bin/activate

$ python
>>> import numpy
>>> numpy.__version__
'X.Y.Z'

If you forget to activate your virtualenv, you should expect to see the following error when importing the Numpy package:

ModuleNotFoundError: No module named 'numpy'

Displaying live print output¶

The output of a print() statement will be printed to sys.stdout. In a batch job, output will be buffered until the Python script completes, when the buffer will be flushed to the job output file. Jobs which are killed or do not run to completion may not flush the buffer to the job output file, potentially resulting in lost workloads.

To force writing to the file after each print() statement, you will need to pass the flush=True argument to each print() statement. For example:

print("something", flush=True)

See the Python print() Documentation for further information.

Output file buffering increases the performance of your job by replacing lots of small write operations with a single larger write. Flushing after every write should be used only when necessary. You may notice that your jobs take longer to run when using flushing, especially when writing lots of small messages.

Example jobs¶

Serial job¶

Threading with OpenMP

Some Python packages use OpenMP for threading. For serial jobs, all Python modules set the variable OMP_NUM_THREADS to be the number of requested slots in the current job, if the variable is unset when loading the module. This avoids an issue where some packages incorrectly use too many threads for OpenMP work.

Here is an example job running on 1 core:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load python
python example.py

Serial job (virtualenv)¶

To use a previously created Python virtualenv in a job script, you need to activate the virtualenv as shown below (replace <envname> with the actual name of your virtualenv):

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

# Load Python module
module load python

# Activate virtualenv
source <envname>/bin/activate

# Run Python script
python example.py