Choosing a Python distribution¶
Python 2 is not supported on Apocrita
Python 2 has not been supported by the Python Software Foundation since January 2020 therefore, we are unable to provide support for any project requiring Python 2.
Glossary¶
- Python is a language. It has rules about what is and what is not valid syntax (see the language reference).
- Python code is more tangible than the language; it is text that meets the
requirements of the language. For example,
print("hello world")
. - An implementation is required to run code. In the case of Python, this means an interpreter. CPython is the most commonly used Python implementation. Several others are listed at https://www.python.org/download/alternatives/.
- A distribution is an implementation that has been made available for other people to install (for example, by directly downloading or using a package manager). They typically contain other utilities such as a profiler, debugger, libraries and documentation.
- Languages, code, implementations and distributions all go through different versions. Versions are usually identified using a version number, often in the "major.minor.patch" format.
Identifying a distribution¶
Load an appropriate module before using Python
Please load an appropriate module before using Python, e.g.
module load python
or module load miniforge
.
Operating systems, like Rocky, often come with a distribution of Python
pre-installed for system-level processes to function. Without loading a
module, running python -V -V
will show the system-level python installed
on Apocrita, which is currently 3.9.18, compiled with GCC 11.4.1.
You should not use this Python for your projects, as this may change suddenly, as we apply system-level updates across the cluster. Instead, load an appropriate module for your desired Python distribution.
Python distribution module file conflicts
To prevent errors when running the Python interpreter, we have designed the Python distribution modules to produce an error when two or more are loaded into the same environment.
Launching the Python interpreter from the command line will give the version
and compiler information. To provide an example, we first load the
python/3.12.1-gcc-12.2.0
module to avoid launching the system-level Python
interpreter:
$ module load python/3.12.1-gcc-12.2.0
$ python -V -V
Python 3.12.1 (main, Jun 6 2024, 15:45:45) [GCC 12.2.0]
To find out the implementation details, we can use the platform
library
$ python -c "import platform; print(platform.python_implementation())"
CPython
In this case, we would refer to both the implementation and distribution as "CPython". As we'll show below, the CPython implementation is contained in different distributions.
Distributions on Apocrita¶
There are currently two Python distributions available on Apocrita, both which use the CPython implementation:
- The official Python distribution, available at python.org
- Miniforge
You can view all available Python distribution modules with the following command:
module avail python miniforge
Official Python distribution¶
The official distribution "CPython" is an implementation of Python and also a distribution. It is the reference implementation, is written in C (thus the name "CPython") and comes with the standard library, a large library of commonly used functions and classes written in Python and C.
On Apocrita, this implementation is accessed through python/
modules. For
example:
module load python
On another computer, it can be installed either by downloading from
python.org or using your system's
package manager, where it is probably called python
or python3
.
Why use CPython?
- It is the "official" and most widely used implementation
- You can view and modify the source code
- It is available on a wide array of platforms; from Apocrita to Raspberry Pi
Why not use CPython?
- There may be other distributions that are more widely used in your field
- It may not be the quickest implementation
- It does not come bundled with as many tools and libraries as other implementations
You can find out more about CPython on our Python page.
Miniforge¶
Anaconda and Miniconda are no longer available on Apocrita due to licensing issues.
Miniforge is a distribution of Python and R that is aimed at data science. It bundles the CPython implementation with hundreds of commonly used data science and machine learning libraries, available via Conda channels.
On Apocrita, this implementation is accessed through miniforge/
modules. For
example:
module load miniforge
On another computer, it can be installed by downloading it from the Miniforge GitHub releases page
Why use Miniforge?
- It is widely used in the data science community
- It includes useful software and packages
- It may be somewhat faster that CPython
Why not use Miniforge?
- Anything you can do with Miniforge, you can do with CPython (with a bit more work)
- The performance improvement may be insignificant on your project
You can find out more on our Miniforge page.
Choosing a distribution¶
Finally, we offer some thoughts on choosing a distribution for your project.
Speed¶
The speed difference between distributions is often only slight.
Given that the performance gained by changing distribution is mostly modest, your time may be better spent on other optimisation techniques such as vectorisation and parallelisation. These techniques can be used with all the Python distributions mentioned here.
Compatibility and reproducibility¶
When you write code on your machine, you want to know that you can run it on Apocrita or share it with a colleague quickly and reliably. We recommend making a new environment for each project you work on. The details of this environment can be saved to a file and kept under version control.
The conda
and mamba
commands that come with Miniforge offer a slightly more
thorough way to do that by saving your environment to a .yml
file with
conda env export
(see our Miniforge docs for
more information). However, the same end result can be accomplished using pip
,
virtualenv
and CPython. The most important thing is not which tool you use
for this but that you use one at all.