Compiling Code

Available compilers

A number of compilers are available on the systems:

  • Intel compiler (part of Intel Parallel Studio)
  • GCC

GCC will give reliable results. However, depending on your code and libraries, the Intel compiler may provide considerable performance improvements.

Compilation should be performed as job submissions or interactively via qlogin in order not to impact the frontend nodes for other users.

This page focuses on C, C++ and Fortran languages; these being the most common compiled languages in use on the cluster.

See the Submitting jobs page for more information on how to launch an interactive session.

Loading a compiler module

It is generally a good idea to be specific with your compiler version. Check which modules you have loaded to be sure you have the right compiler and that there are no conflicts. The available compiler versions can be viewed in the devtools section of the output of the module avail command.

Check the available versions for the gcc compiler:

$ module avail gcc

Intel compiler version 2017.3 can be loaded with the command

module load intel/2017.3

You can test this by typing the command:

icc -V

This should return a short message reporting the compiler version

Often, you will require other libraries and headers that can be found in specific modules. One example is Intel's MPI library. Such a module can be loaded with:

module load intelmpi/2017.3

Again, check your loaded modules with

module list

If you don't specify a particular version, the version marked as default in the output of module avail command will be loaded.

Compilation for Specific Nodes

Different processors support different instruction sets which may provide a performance boost, during compilation these instruction sets can be configured via CPU architecture flags.

Supported Instruction Sets

Some instruction sets are not supported on all nodes, so you may need to compile different binaries for each node type.

Please see the table here for details on supported instruction sets.

Checking available CPU flags on a node

Supported instruction sets including the available CPU flags are available in the /proc/cpuinfo file on each Apocrita node on the line labelled flags.

For example, checking for all available CPU flags and instruction sets on a sm node:

[sm0.apocrita ~]$ cat  /proc/cpuinfo | grep flags | head -1

Checking for SSE2 instruction set on a sm node:

[sm0.apocrita ~]$ cat /proc/cpuinfo | grep flags | head -1 | tr ' ' '\n' | grep sse2

Checking for AVX2 instruction set on a sm node:

[sm0.apocrita ~]$ cat /proc/cpuinfo | grep flags | head -1 | tr ' ' '\n' | grep avx2
[sm0.apocrita ~]$

In this example, the CPU does not support AVX2 set, applications compiled with -mavx2 flag may not run on the sm Apocrita nodes.

Selecting instruction sets during compilation

Instruction sets can be enabled via the -m<cpu_flag> and disabled via the -mno-<cpu_flag> when compiling code.

Intel compiler Host flag

The Intel compiler provides a -xHost flag to include all instruction sets supported by the build host processor.

Build Systems

Typically, software for Linux comes with a build system with one of two flavours: GNU Makefiles and CMake

GNU Makefiles are more common. The general steps are as follows:


First one runs the configure command. This creates a Makefile. One then runs the make command that reads the Makefile and calls the necessary compilers, linkers and such.

CMake is similar but creates Makefiles, Visual Studio Projects, OSX XCode Projects and more. Such projects can be identified by the presence of a CMakeList.txt file. For these interested in building their own software, CMake is recommended over GNU Makefiles.

One major advantage of CMake is it allows out-of-source builds. Put another way, one can create a binary and all its associated support files in a directory that is not the same as the one with the source files. This can be quite advantageous when working with a source management tool like Git or SVN.

To work with CMake, start with creating a directory at the same level as the source code. For example...

$ pwd
$ mkdir MySourceCode_build
$ cd MySourceCode_build
$ cmake ../MySourceCode

Essentially, you enter the build directory and call cmake with the path to your CMakeList.txt file.

If you wish to configure your build, you can use the program ccmake

The end result, on Linux, is another Makefile. So to complete your build you type:


just as you would with the GNU Makefile setup. Under another OS such as Windows or OSX, Cmake would create a corresponding build file like a Visual Studio Project or similar

To learn more about GNU Makefiles, CMake, follow the links below

Optional Libraries for HPC


The Message Passing Interface is often used in HPC applications. We recommend the use of IntelMPI. This can be loaded using the module command before you compile your code, and when you execute your binaries on the cluster.

module load intelmpi

Other MPI libraries are available, such as OpenMPI.

module load openmpi

Be aware several versions may be available. The default version is usually set to the latest release - please supply the version number when loading the module if you require a different version.

Compiling and testing

If make succeeds, you should see various calls being printed on your screen with the name of the compiler you chose. If compilation completed successfully you should see a success message of some kind, and an executable appear in your source or build directory.

Quite often, software comes with test programs you can also build. Often, the command to do this looks like the following:

make test


Software optimisation comes in many forms, such as compiler optimisation, using alternate libraries, removing bottlenecks from code, algorithmic improvements, and using parallelisation. Using processor-specific compiler options may reduce universal compatibility of your compiled code, but could yield substantial improvements.

The Intel and gcc compiler may give different performance depending on different libraries or processor optimisation. Benchmarking code compiled with both compilers is recommended.

gcc compiler options

The gcc compiler has the -march=native and -mtune=native options which will automatically determine optimal flags for the machine your are compiling on. You will need to build the code on the same type of node you will be executing on (via qsub or qlogin session) to use the relevant processor optimisation.

To find out what the compiler will do with the -march=native option you can use:

gcc -march=native -Q --help=target

Processor incompatibilities with optimised code

Optimised code should provide a performance boost, but note that code optimised for a certain architecture may not run on other nodes, due to AMD/Intel differences, or lack of a certain feature in older processors.

Profiling Tools

Once you have a running program that has been tested, there are several tools you can use to check the performance of your code. Some of these you can use on the cluster and some you can use on your own desktop machine.


perf is a tool that creates a log of where your program spends its time. Use it as a guide to see where you need to focus your time when optimising code. This is available on linux machines (and possibly OSX as well). To use perf, you must compile your program with the -pg option if using GCC or -p if using the Intel compiler.

Once you have compiled your program with this switch, you need to run perf

perf record -a -g <my program>

Replace with the program you want to test. Once the program has completed, run the following command:

perf report --sort comm,dso

This will display the function calls in order of the most called.

More information on perf can be found at and this extensive tutorial


strace is another tool that lists the system calls a program makes:


valgrind is a suite of tools that allow you to improve the speed and reduce the memory usage of your programs. An example command would be:

valgrind --tool=memcheck <myprogram>

Valgrind is well suited for multi-threaded applications, but may not be suitable for longer running applications due to the slowdown incurred by the profiled application. In addition, there is a graphical tool which is not offered on the cluster but will work on Linux desktops. There is also an extensive manual.


The above files work best for compiled binaries. If you are writing code in python, cProfile is one useful option: