Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++.
Python is available as a module on Apocrita.
There are multiple versions of Python available as modules including Python 2 and Python 3, Python 2 is still very common but is now in legacy mode, Python 3 is under active development and has a large number of improvements.
While some code will work in both versions there are a number of incompatibilities so the version of Python you need may depend on the code you are running. For new code bases Python 3 is strongly recommended.
Python distribution module file conflicts
To prevent errors when running the Python interpreter, attempting to load an additional Python distribution module after loading a Python module will produce a module load conflict error.
To run the default installed version of Python (the latest Python 3), simply load the Python module:
module load python
then run Python with a script file:
We recommend using the Python command for the specific Python version you
python2.7) rather than using the default
python, as the
default may change.
Displaying live print output¶
In Python 3, the output of a
print() statement will be printed to
sys.stdout and in a batch job which redirects this to the job output file,
the output will be buffered until execution has completed; jobs which are
killed or do not run to completion may not flush the buffer to the job output
file, potentially resulting in lost workloads.
To force writing to the file after each
print() statement, you will need to
flush=True argument to each
print() statement. For example:
See the Python3 Print Documentation for further information.
Output file buffering increases the performance of your job by replacing lots of small write operations with a single larger write. Flushing after every write should be used only when necessary. You may notice that your jobs take longer to run when using flushing, especially when writing lots of small messages.
Threading with OpenMP
Some Python packages use OpenMP for threading. For serial jobs, some
modules set the variable
OMP_NUM_THREADS to be the number of requested
slots in the current job, if the variable is unset when loading the module.
This avoids an issue where some packages incorrectly use too many threads
for OpenMP work. However, always check this value and we recommend setting
it manually to be compatible with older modules which do not set it.
If you load a Python module as part of a job script, then the
variable should be set automatically.
Here is an example job running on 1 core.
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 1 #$ -l h_rt=1:0:0 #$ -l h_vmem=2G module load python python example.py
Serial job - virtualenv¶
To use a Python virtualenv in a job script you need to activate the virtualenv:
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 1 #$ -l h_rt=1:0:0 #$ -l h_vmem=2G # Load Python module module load python # Activate virtualenv source <envname>/bin/activate # Run Python script python example.py
Serial job - Cutadapt¶
This is an example of using cutadapt installed in a virtualenv.
#!/bin/bash #$ -cwd #$ -j y #$ -pe smp 1 #$ -l h_rt=1:0:0 #$ -l h_vmem=1G # Load Python module module load python # Activate virtualenv source cutadapt/bin/activate # Run cutadapt cutadapt raw_data.fq.gz
Installing Python packages¶
Whilst packages can be installed locally using
recommend using virtualenvs to ensure clean environments.
You can use virtualenv to set up your own virtual Python environment over which you have full control. This allows you to use a specific Python version and its own set of packages. Once the virtual environment is set up, you need activate it when you log in and then you can use python, pip and easy_install.
# virtualenv is installed as part of the python module $ module load python # Set up an environment called <envname> $ virtualenv <envname> # Activate the environment $ source <envname>/bin/activate # Use Python / pip etc. in the environment (<envname>)$ pip install <module> # Run Code (<envname>)$ python example.py # Stop using the environment $ deactivate
To install a specific version of a module run:
$ module load python $ source <envname>/bin/activate (<envname>)$ pip install <module>==<version>
To upgrade an installed version of a module run:
$ module load python $ source <envname>/bin/activate (<envname>)$ pip install <module> --upgrade
Running virtual environments
To benefit from thread optimisation and shared library support, load the same python module that was used during environment creation before activating your virtual environment.
Setting up numpy¶
Using virtualenv, it is straight-forward to install a personal copy of numpy:
$ module load python $ virtualenv numpy $ source numpy/bin/activate (numpy)$ pip install numpy
Setting up cutadapt¶
To install Cutadapt using a virtualenv run the following commands:
$ module load python $ virtualenv cutadapt $ source cutadapt/bin/activate (cutadapt)$ pip install cutadapt (cutadapt)$ cutadapt raw_data.fq.gz
Setting up matplotlib¶
To install Matplotlib using a virtualenv, run the following commands:
$ module load python $ virtualenv matplotlib $ source matplotlib/bin/activate (matplotlib)$ pip install matplotlib
Selecting the matplotlib backend
As python modules on Apocrita do not include Tkinter support, matplotlib requires a non-interactive backend to write output to a file rather than rendering results in a graphical window.
The following commands instruct matplotlib to write output to a file using the
Agg backend (after creating the virtualenv as instructed above):
(matplotlib)$ python >>> import matplotlib >>> matplotlib.use('Agg') >>> import matplotlib.pyplot as plt
For more information regarding matplotlib backends, please see: here.