Infiniband is a high throughput, low latency node interconnect that allows Remote Direct Memory Access (RDMA) which can significantly improve the speed of large multi-node parallel jobs.

Infiniband islands

Apocrita has a number of infiniband enabled nodes, grouped into separate islands based on node type:

Nodes Island name Infiniband type
ccn0-16 ccn QDR
nxn0-31 nxn FDR
nxv1-20 nxv EDR
sdv1-4 sdv EDR

The appropriate island can be selected for parallel jobs by adding the -l infiniband=<island> parameter to your submission script.

An example of this setting is:

#$ -l infiniband=nxv

Jobs across islands

Jobs are not scheduled across multiple islands as this would severely affect performance.

Direct infiniband nodes

A number of nodes are not connected to an infiniband island but have direct connections to other nodes. This allows for smaller parallel jobs to run on specialist hardware.

The appropriate direct connection can be selected for two-node parallel jobs with -l infiniband_direct=<name>.

Currently, the following nodes are directly connected:

Nodes Direct name Infiniband type
nxg3-4 nxg3-4 EDR
nxv35-36 nxv35-36 EDR

An example of this setting is:

#$ -l infiniband_direct=nxg3-4