A number of compilers are available on the systems:
- Intel compiler (part of Intel Parallel Studio)
GCC will give reliable results. However, depending on your code and libraries, the Intel compiler may provide considerable performance improvements.
Compilation should be performed as job submissions or interactively via qlogin in order not to impact the frontend nodes for other users.
This page focuses on C, C++ and Fortran languages; these being the most common compiled languages in use on the cluster.
See the Submitting jobs page for more information on how to launch an interactive session.
Loading a compiler module¶
It is generally a good idea to be specific with your compiler version. Check
which modules you have loaded to be sure you have the right compiler and that
there are no conflicts. The available compiler versions can be viewed in the
devtools section of the output of the
module avail command.
Check the available versions for the gcc compiler:
$ module avail gcc gcc/6.3.0(default)
Intel compiler version 2017.3 can be loaded with the command
module load intel/2017.3
You can test this by typing the command:
This should return a short message reporting the compiler version
Often, you will require other libraries and headers that can be found in specific modules. One example is Intel's MPI library. Such a module can be loaded with:
module load intelmpi/2017.3
Again, check your loaded modules with
If you don't specify a particular version, the version marked as
the output of
module avail command will be loaded.
Compilation for Specific Nodes¶
Different processors support different instruction sets which may provide a performance boost, during compilation these instruction sets can be configured via CPU architecture flags.
Supported Instruction Sets
Some instruction sets are not supported on all nodes, so you may need to compile different binaries for each node type.
Please see the table here for details on supported instruction sets.
Checking available CPU flags on a node¶
Supported instruction sets including the available CPU flags
are available in the
/proc/cpuinfo file on each Apocrita node
on the line labelled
For example, checking for all available CPU flags and instruction sets on a sm node:
[sm0.apocrita ~]$ cat /proc/cpuinfo | grep flags | head -1
Checking for SSE2 instruction set on a sm node:
[sm0.apocrita ~]$ cat /proc/cpuinfo | grep flags | head -1 | tr ' ' '\n' | grep sse2 sse2
Checking for AVX2 instruction set on a sm node:
[sm0.apocrita ~]$ cat /proc/cpuinfo | grep flags | head -1 | tr ' ' '\n' | grep avx2 [sm0.apocrita ~]$
In this example, the CPU does not support AVX2 set,
applications compiled with
-mavx2 flag may not run on the sm Apocrita nodes.
Selecting instruction sets during compilation¶
Instruction sets can be enabled via the
and disabled via the
-mno-<cpu_flag> when compiling code.
Intel compiler Host flag
The Intel compiler provides a
-xHost flag to include all instruction
sets supported by the build host processor.
Typically, software for Linux comes with a build system with one of two flavours: GNU Makefiles and CMake
GNU Makefiles are more common. The general steps are as follows:
First one runs the configure command. This creates a Makefile. One then runs the make command that reads the Makefile and calls the necessary compilers, linkers and such.
CMake is similar but creates Makefiles, Visual Studio Projects, OSX XCode Projects and more. Such projects can be identified by the presence of a CMakeList.txt file. For these interested in building their own software, CMake is recommended over GNU Makefiles.
One major advantage of CMake is it allows out-of-source builds. Put another way, one can create a binary and all its associated support files in a directory that is not the same as the one with the source files. This can be quite advantageous when working with a source management tool like Git or SVN.
To work with CMake, start with creating a directory at the same level as the source code. For example...
$ pwd /data/home/abc123/MySourceCode $ mkdir MySourceCode_build $ cd MySourceCode_build $ cmake ../MySourceCode
Essentially, you enter the build directory and call
cmake with the path to
your CMakeList.txt file.
If you wish to configure your build, you can use the program
The end result, on Linux, is another Makefile. So to complete your build you type:
just as you would with the GNU Makefile setup. Under another OS such as Windows or OSX, Cmake would create a corresponding build file like a Visual Studio Project or similar
To learn more about GNU Makefiles, CMake, follow the links below
Optional Libraries for HPC¶
The Message Passing Interface is often used in HPC applications. We recommend the use of IntelMPI. This can be loaded using the module command before you compile your code, and when you execute your binaries on the cluster.
module load intelmpi
Other MPI libraries are available, such as OpenMPI.
module load openmpi
Be aware several versions may be available. The default version is usually set to the latest release - please supply the version number when loading the module if you require a different version.
Compiling and testing¶
If make succeeds, you should see various calls being printed on your screen with the name of the compiler you chose. If compilation completed successfully you should see a success message of some kind, and an executable appear in your source or build directory.
Quite often, software comes with test programs you can also build. Often, the command to do this looks like the following:
Software optimisation comes in many forms, such as compiler optimisation, using alternate libraries, removing bottlenecks from code, algorithmic improvements, and using parallelisation. Using processor-specific compiler options may reduce universal compatibility of your compiled code, but could yield substantial improvements.
The Intel and gcc compiler may give different performance depending on different libraries or processor optimisation. Benchmarking code compiled with both compilers is recommended.
gcc compiler options¶
The gcc compiler has the
-mtune=native options which will
automatically determine optimal flags for the machine your are compiling on.
You will need to build the code on the same type of node you will be executing
on (via qsub or qlogin session) to use the relevant processor optimisation.
To find out what the compiler will do with the
-march=native option you can use:
gcc -march=native -Q --help=target
Processor incompatibilities with optimised code
Optimised code should provide a performance boost, but note that code optimised for a certain architecture may not run on other nodes, due to AMD/Intel differences, or lack of a certain feature in older processors.
Once you have a running program that has been tested, there are several tools you can use to check the performance of your code. Some of these you can use on the cluster and some you can use on your own desktop machine.
perf is a tool that creates a log of where your program spends its time. Use it as a guide to see where you need to focus your time when optimising code. This is available on linux machines (and possibly OSX as well). To use perf, you must compile your program with the -pg option if using GCC or -p if using the Intel compiler.
Once you have compiled your program with this switch, you need to run perf
perf record -a -g <my program>
perf report --sort comm,dso
This will display the function calls in order of the most called.
strace is another tool that lists the system calls a program makes: https://en.wikipedia.org/wiki/Strace
valgrind is a suite of tools that allow you to improve the speed and reduce the memory usage of your programs. An example command would be:
valgrind --tool=memcheck <myprogram>
Valgrind is well suited for multi-threaded applications, but may not be suitable for longer running applications due to the slowdown incurred by the profiled application. In addition, there is a graphical tool which is not offered on the cluster but will work on Linux desktops. There is also an extensive manual.
The above files work best for compiled binaries. If you are writing code in python, cProfile is one useful option: https://julien.danjou.info/blog/2015/guide-to-python-profiling-cprofile-concrete-case-carbonara