Compiling C, C++ and Fortran code¶

On Apocrita we provide a number of compilers and interpreters for popular programming languages. You can use these to build and run your own project code. On Apocrita we also provide programs and software components for you to use, but you can also use the compiler tools to build these for yourself.

This page focuses on the C, C++ and Fortran languages which are the most common compiled languages in use on the cluster. Other documentation pages exist for: Java, Julia, Python, R and Ruby.

Available compilers¶

A number of compiler suites, each offering C, C++ and Fortran compilers, are available on Apocrita:

GCC
Intel (part of Intel Parallel Studio XE)
NVIDIA HPC SDK

Within a compiler suite the provided C compiler is a companion processor to the Fortran compiler in the sense of C interoperability.

Modules are available to set up the user environment giving access to these compilers. One version of the GCC compilers will be available without loading a module although this is typically a much earlier version than offered through the module system. Usually it is preferable to load the module for the latest release of each compiler.

GCC will usually give reliable results. However, depending on your code and libraries, the Intel or NVIDIA compilers may provide considerable performance improvements.

Compilation should be performed as job submissions or interactively via qlogin in order not to impact the frontend nodes for other users. It is usually advisable to compile code on the same architecture machines as it will be run on, so the appropriate node selection should be applied to these job requests.

In general it makes sense to stick to the same compiler for the whole project. For C/C++ it should be possible to use different compilers for code which is then linked together but with Fortran code this is less easy.

We also provide older releases of the PGI Community Edition compiler suite. These are no longer licensed for compiling but remain available to allow previously compiled code to run. Modules for the PGI compilers are in our deprecated modules collection. The NVIDIA compiler suite is a drop-in replacement for the PGI suite.

Loading a compiler module¶

It is generally a good idea to be specific with your compiler version. Check which modules you have loaded to be sure you have the right compiler and that there are no conflicts. The available compiler versions can be viewed in the devtools section of the output of the module avail command.

Check the available versions for the GCC compiler suite:

$ module avail gcc
gcc/4.8.5  gcc/6.3.0  gcc/7.1.0(default)  gcc/8.2.0  gcc/10.2.0

Intel compiler version 2017.3 can be loaded with the command

module load intel/2017.3

You can test this by typing the command:

icc -V

This should return a short message reporting the compiler version:

Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.3.191 Build 20170404
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

Often, you will require other libraries and headers that can be found in other modules. Unlike modules which provide many programs and tools, these library modules may have versions specific to a particular compiler suite. For example, for Open MPI:

$ module avail openmpi
openmpi/2.1.0-gcc    openmpi/3.0.0-gcc(default)
openmpi/2.1.1-intel  openmpi/3.1.2-gcc

Modules for use with a single compiler suite have an indicating suffix, such as the -gcc and -intel seen here. For example, to use Open MPI with the Intel compiler suite we would load modules as:

module load intel/2017.3 openmpi/2.1.1-intel

Again, check your loaded modules with

module list

If you don't specify a particular version, the version marked as default in the output of module avail command will be loaded.

Using the compilers¶

Each of the compiler suites provides a C, C++ and a Fortran compiler. The name of the compiler command varies with the language and the compiler suite. For convenience the compiler suite modules set consistent environment variables by which the compilers may be referenced. The compiler names and variables are given in the following table:

Language	Variable	GCC	Intel	NVIDIA
C	CC	gcc	icc	nvcc
C++	CXX	g++	icpc	nvc++
Fortran	FC	gfortran	ifort	nvfortran

Compilation for specific nodes¶

Different processors support different instruction sets which may provide a performance boost. During compilation the target instruction sets can be selected via CPU architecture flags. The required flags vary by compiler suite and are detailed below.

Supported Instruction Sets

Some instruction sets are not supported on all nodes, so you may need to compile different binaries for each node type.

Please see the table here for details on supported instruction sets.

Checking available CPU flags on a node¶

The CPU flags, including details of supported instruction sets, are listed in the file /proc/cpuinfo on each Apocrita node. This file has a line labelled flags.

For example, checking for all available CPU flags and supported instruction sets on an sdx node:

[sdx1.apocrita ~]$ grep flags /proc/cpuinfo | uniq

Checking for AVX2 instruction set support on an sdx node:

[sdx1.apocrita ~]$ awk '/flags.*avx2/ {print "AVX2 supported";exit}' /proc/cpuinfo
AVX2 supported

Checking for AVX-512F instruction set support on an nxv node:

[nxv1.apocrita ~]$ awk '/flags.*avx512f[$\ ]/ {print "AVX-512F supported";exit}' /proc/cpuinfo
[nvx1.apocrita ~]$

In this example the CPUs on the nxv node do not support the AVX-512 extensions: applications compiled targeting these may not run on nxv Apocrita nodes.

Selecting instruction sets during compilation¶

The instruction sets which may be targeted by the compiler can be selected using a compile-time flag. When using the GCC compilers (gcc, g++ or gfortran) the flag -march=<cpu_flag> targets the instruction set of the given CPU type. When using this option the compiler may generate code which will not run on other CPU types. Notably, the option -march=native targets the instruction set of the CPU type running the compiler. This targeting may be disabled using the option -mno-<cpu_flag> when compiling code.

To see what the GCC compiler will do with the -march=native option you can use:

gcc -march=native -Q --help=target

Alternatively, the option -mtune=<cpu_flag> asks the compiler to tune the produced code to the given CPU type without restricting the instruction set to that of CPUs of that type.

The Intel compilers, as well as having -march (albeit with different semantics), has a flag -xHost which requests targeting of the highest instruction set available on the CPU type on which the compiler runs.

Processor incompatibilities with targeted code

Code produced with these options should provide a performance boost, but it is important to note that code optimised for a certain architecture may not run on other nodes, due to AMD/Intel differences, or lack of a certain feature in older processors.

You will need to build the code on the same type of node you will be executing on (via qsub or qlogin sessions) to use the relevant processor optimisation.

The NVIDIA compilers do not offer a -march option. Instead, the option -tp=<target> is to be used.

More information on the compiler specific architecture flags is available in the vendor documentation:

Using GPU nodes with OpenMP¶

On Apocrita we support offloading to GPU devices using OpenMP with the GCC. If you have access to the GPU nodes you can compile and run appropriate OpenMP programs, such as those using the target construct, as described below.

OpenMP device offload with GCC compilers¶

To use OpenMP target offload on Apocrita with GCC, you will need to use version 10.2 or later. Offloading should then be automatically enabled when OpenMP compilation is selected with the -fopenmp compiler option. For example to compile the source file offload-example.c which uses the target construct, you can use:

module load gcc/10.2.0
gcc -fopenmp offload-example.c

The option -foffload=-lm is required to support the maths library on the target device. If you see an error message like

unresolved symbol sqrtf
collect2: error: ld returned 1 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.

then you will need to provide this option when compiling.

Although it is not necessary to compile the code on a GPU node to enable GPU offload, using the node on which you wish to run is advised when compiling.

An OpenMP program compiled with offload enabled can be run in the same way as with other programs. Offload happens automatically if a GPU is available when a target construct is entered.

To disable offload so that the code with a target construct is run on the CPU host instead of the GPU device, compile the program with -foffload=disable. Equally, the code can be compiled without the -fopenmp option if OpenMP is not required.

libgomp loader warnings on non-GPU nodes

If you run an OpenMP program with offload target regions on a node without a GPU you may see a warning like:

libgomp: while loading libgomp-plugin-nvptx.so.1: libcuda.so.1: cannot open shared object file: No such file or directory

These warnings occur because we provide a single compiler build to work on all node types. Compiling programs with -foffload=disable will not avoid such warnings. However, affected parallel regions will still run on the host CPU and the warnings can be safely ignored.

Build systems¶

Typically, software for Linux comes with a build system with one of two flavours: GNU Autotools and CMake. Each of these typically uses the Make tool at a lower level.

On Apocrita the GNU Autotools system can be used without loading a module, although it may be necessary to load an autotools-archive module to support some additional macros. To use CMake it is necessary to load a cmake module.

For a project using GNU Autotools the general steps to build are as follows:

./configure [options]
make

First one runs a configuration command which creates a Makefile. One then runs the make command that reads the Makefile and calls the necessary compilers, linkers and such.

CMake is similar but as well as supporting Makefiles can also configure the build system using Visual Studio Projects, OSX XCode Projects and more. Such projects can be identified by the presence of a CMakeList.txt file.

GNU Autotools and CMake support out-of-tree source builds. Put another way, one can create a binary and all its associated support files in a directory that is not the same as the one with the source files. This can be quite advantageous when working with a source management tool like Git or SVN or when building the project supporting several different configurations, such as for debugging or targeting different node types.

To work with CMake with an out-of-tree build, start with creating a build directory in a different location:

$ pwd
/data/home/abc123/MySourceCode
$ mkdir ../MySourceCode_build
$ cd ../MySourceCode_build
$ cmake ../MySourceCode

Essentially, you enter the build directory and call cmake with the path to your CMakeList.txt file. If you wish to re-configure your build, you can use the program ccmake.

The end result is a Makefile. So to complete your build you type:

make

just as you would with the GNU Autotools setup.

Similarly, to use an out-of-tree build with GNU Autotools:

$ pwd
/data/home/abc123/MySourceCode
$ mkdir ../MySourceCode_build
$ cd ../MySourceCode_build
$ ../MySourceCode/configure

To learn more about GNU Autotools, CMake, and Makefiles follow the links below

Optional libraries for HPC¶

MPI¶

The Message Passing Interface is a protocol for parallel computation often used in HPC applications. On Apocrita we have the distinct implementations Intel MPI and Open MPI available. In general we recommend the use of Intel MPI where suitable.

The module system allows the user to select the implementation of MPI to be used, and the version. With Open MPI, as noted above, one must be careful to load a module compatible with the compiler suite being used.

To load the default (usually latest) Intel MPI module:

module load intelmpi

To set up the Open MPI environment, version 3.0.0, suitable for use with the GCC compiler suite:

module load openmpi/3.0.0-gcc

For each implementation, several versions may be available. The default version is usually set to the latest release: an explicit version number is required to load a different version.

Default module for Open MPI

The Open MPI modules have a default loaded following the command module load openmpi which is openmpi/3.0.0-gcc. This default module is specific to the GCC compiler suite and so to access an MPI implementation compatible with a different compiler suite a specific module name must be specified.

To build a program using MPI it is necessary for the compiler and linker to be able to find the header and library files. As a convenience, the MPI environment provides wrapper scripts to the compiler, each of which sets the appropriate flags for the compiler. The name of each wrapper script depends on the implementation and the target compiler.

Open MPI¶

For each Open MPI module, and the implementation provided by the NVIDIA compiler suite module, the wrapper scripts are consistently named for each language. These are given in the table below:

Language	Script
C	mpicc
C++	mpic++
Fortran	mpif90

As an example, a Fortran MPI program may be compiled as

module load openmpi/3.0.0-gcc
mpif90 -o hello hello.f90

rather than requiring the addition of numerous include and linker path flags:

gfortran -o hello hello.f90 -I... -L... -l...

The Open MPI wrapper scripts provide an option -show which details the final invocation of the compiler:

$ module load openmpi/3.0.0-gcc
$ mpif90 -show -o hello hello.f90
gfortran -o hello hello.f90 ...

No Open MPI module is provided for use with the NVIDIA compiler suite. Instead, the installed NVIDIA compiler environment provides an Open MPI implementation and the NVIDIA compiler module contains the appropriate settings:

$ module purge; module load nvidia-hpc-sdk/21.3
$ type mpif90
mpif90 is /share/apps/centos7/nvidia-hpc-sdk/2021_213/Linux_x86_64/21.3/comm_libs/mpi/bin/mpif90

Intel MPI¶

In contrast, the Intel MPI implementation supports both the Intel and GCC compiler suites in the same module. As with Open MPI wrapper scripts are provided, but these wrapper script names depend on the target compiler suite as well as the language. The wrapper script names are as in the following table:

Language	Compiler suite	Script
C	GCC	mpicc
C	Intel	mpiicc
C++	GCC	mpic++
C++	Intel	mpiicpc
Fortran	GCC	mpif90
Fortran	Intel	mpiifort

The scripts can be used as in the Open MPI example above:

$ module load intelmpi
$ mpif90 -show -o hello hello.f90
gfortran -o 'hello' 'hello.f90' ...
$ mpiifort -show -o hello hello.f90
ifort -o 'hello' 'hello.f90' ...

Matching versions of Intel MPI and Intel compiler

In general we recommend that, when using Intel MPI with the Intel compilers, you match the versions of the modules. For example, if using module load intel/2017.3 then you should also use module load intelmpi/2017.3. However, there are times where it is necessary or desirable to use a different version of Intel MPI. In these cases you should load the Intel MPI module after loading the compiler module: module load intel/2017.3 intelmpi/2018.3.

There is no support for the NVIDIA compilers in the Intel MPI implementation.

Compiling and testing¶

If make succeeds, you should see various calls being printed on your screen with the name of the compiler you chose. If compilation completed successfully you should see a success message of some kind, and an executable appear in your source or build directory.

Quite often, software comes with test programs you can also build. Often, the command to do this looks like the following:

make test

Optimisation¶

Software optimisation comes in many forms, such as compiler optimisation, using alternate libraries, removing bottlenecks from code, algorithmic improvements, and using parallelisation. Using processor-specific compiler options may reduce universal compatibility of your compiled code, but could yield substantial improvements.

The Intel, NVIDIA and GCC compilers may give different performance depending on different libraries or processor optimisation. Benchmarking and comparing code compiled with each compiler is recommended.

Profiling tools¶

Once you have a running program that has been tested, there are several tools you can use to check the performance of your code. Some of these you can use on the cluster and some you can use on your own desktop machine.

perf¶

perf is a tool that creates a log of where your program spends its time. The report can be used as a guide to see where you need to focus your time when optimising code. Once the program has been compiled, it should be run through the record subcommand of perf:

perf record -a -g my_program

where my_program is the name of the program to be profiled. Once the program run a log file is generated. This log file may be analysed with the report subcommand of perf. For example, to display the function calls in order of the most called:

perf report --sort comm,dso

More information on perf can be found at this Profiling how-to and this extensive tutorial

valgrind¶

valgrind is a suite of tools that allow you to improve the speed and reduce the memory usage of your programs. An example command would be:

valgrind --tool=memcheck <myprogram>

Valgrind is well suited for multi-threaded applications, but may not be suitable for longer running applications due to the slowdown incurred by the profiled application. In addition, there is a graphical tool which is not offered on the cluster but will work on Linux desktops. There is also an extensive manual.

Python profiling tools¶

The above tools work best for compiled binaries. If you are writing code in Python, cProfile and line_profiler are useful options. Optimizations for slow-running Python code include parallelisation with multiprocessing or dask to use multiple cores efficiently, and compilers such as pythran or numba. For more details, High Performance Python by Micha Gorelick and Ian Ozsvald is available to QMUL staff and students.