Skip to content

R

R is an interpreted computer language for statistical computing and graphics including: linear and generalised linear models, non-linear regression models, time series analysis, classical parametric and non-parametric tests.

For efficiency, procedures written in the C, C++, or Fortran languages may be interfaced within R in addition to the user-visible R functions.

R is available as a module on Apocrita.

R can also be used via Conda, please see this blog post for more information.

For interactive graphical R usage in RStudio, please see our separate documentation for RStudio via Open OnDemand.

Usage

Additional libraries via Spack

The Apocrita R module will also activate a Spack environment (link to follow), which contains many of the most common additional libraries (such as HDF5, GDAL, GEOS, PROJ and many more) required to install popular R packages into your personal R library. See here (link to follow) for a full list of packages and libraries contained in each Spack environment.

To run the default installed version of R, simply load the R module:

$ module load R
Activating R shared library Spack environment, please wait...

Loading R/X.Y.Z
  Loading requirement: spack/x.y.z
[r-libraries-x.y.z] $ R --help

Usage: R [options] [< infile] [> outfile]

[r-libraries-x.y.z] is the name of the Spack environment that has been activated alongside the R module load.

For further usage documentation, see the full output of R --help or Rscript --help.

R can be executed through the 'R CMD' interactive interface using R or via the Rscript non-interactive interface. The Rscript binary provides an alternative front end to the legacy R CMD BATCH method to run R commands in a non-interactive shell.

Example jobs

Do not use detectCores() for makeCluster()

detectCores() is not cluster aware and will incorrectly return all cores on a machine even if they are not actually available.

The documentation for R covers this:

"This is not suitable for use directly for ... specifying the number of cores in makeCluster.
First because it may return NA, second because it does not give the number of allowed cores"

We advise specifying the number of cores directly, using the NSLOTS UGE variable:

library(parallel); cluster <- makeCluster(as.integer(Sys.getenv('NSLOTS', '1')))

Serial job

Here is an example job running on 2 cores and 4GB total memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load R
Rscript myscript.R

Using Rscript within Apocrita jobs

When running R jobs on Apocrita, use Rscript to submit a file containing R commands; R output will be written to the stdout and stderror streams.

Additional R Packages

Think twice before using renv!

The renv project manager has become popular, but we strongly advise users to think twice before using it on Apocrita. It aggressively caches existing binaries, but sometimes the cached binaries for personal installs aren't compatible with all environments and can cause issues.

Additionally, renv also enables the "use the Posit Public Package Manager by default?" setting when R projects are initialised by renv::init(). Whilst you may think that the "Rocky 9" packages from PPM would make life simpler, you may run into compatibility issues when running R on Apocrita, hence why we do not recommend this method.

We use Spack environments and modules to provide additional software on Apocrita, and tend to compile almost everything from source. The PPM provides binaries based on dependencies from Rocky 9's software repositories, which we will often not be using actually - we will be using manually compiled versions instead. This is likely to lead to version mismatches and errors.

If you still choose to use renv on Apocrita despite this, the ITSR team can't really offer any support in doing so. There are details for using alternative locations for your libraries on Apocrita below.

Additional packages can be installed to extend the capabilities of R using the install.packages() command. Personally installed packages are stored in libraries.

Personal R libraries

By default, personal packages will be installed into:

/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>

If this doesn't exist the first time you run an install.packages() command, the interactive R shell will offer to create it and ask for confirmation:

Warning in install.packages("<package_name>") :
  'lib = "/opt/R/<VERSION>/lib/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
'/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>'
to install packages into? (yes/No/cancel) yes

The Rscript non-interactive R command won't offer to create a personal library for you. If you run an R script via the Rscript command and it can't find a writable personal library location, the script will fail. You can overcome this either by installing a simple package in an interactive R shell prior to running the Rscript, or creating a personal library location in advance before submitting any script, e.g.:

mkdir -p ${HOME}/R/x86_64-pc-linux-gnu-library/<VERSION>

To display the list of active libraries, execute:

> .libPaths()
[1] "/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>"
[2] "/opt/R/<VERSION>/lib/R/library"

Customising personal library location

Set your custom library path in each new session or job script

R will always default to /data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION> every time it is launched. Make sure you manually set .libPaths() at the start of every job script or R session if you wish to use an alternative personal library location.

You may wish to store your library in an alternative location such as in your Research Group storage space to allow collaboration with others. You may also wish to have multiple personal libraries.

You can change your library location using the .libPaths() command. Note, in this case, the directory you wish to use must already exist:

.libPaths("/custom/path/to/library")

You can check the new path has been set:

> .libPaths()
[1] "/custom/path/to/library"   "/opt/R/4.4.1/lib/R/library"

All subsequent package installations in that session will then be written to this new location.

Displaying loaded packages

To display the list of currently loaded packages (including default packages), execute:

> (.packages())
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"

To display all packages available in the currently loaded libraries, execute:

> .packages(all.available = TRUE)
 [1] "base"       "boot"       "class"      "cluster"    "codetools"
 [6] "compiler"   "datasets"   "foreign"    "graphics"   "grDevices"
...

To install additional R packages, execute:

> install.packages("<package_name>")

Packages which are not included in the R main library will be added to the personal library.

To load an R package, execute:

> library("<package_name>")

To find the location where a package is installed, execute:

> find.package("<package_name>")
[1] "/path/to/package"

R packages with external dependencies

Installing an R package will automatically install other R packages on which that package depends. However, R packages often depend on software packages provided outside R, such as C libraries. These packages may be provided by the automatically activated Spack environment as detailed above.

For example, the library gdal is required to build the rgdal R package and the library gmp is required for several numerical R packages. Both are included as part of the Spack environment.

Should you find anything is missing from the Spack environment, please raise a ticket with us - we can either look into adding it into a future version of the Spack environment, or give you instructions for using an additional module to provide it.

R packages using code in other languages

The R Spack environment will provide an appropriate version of the GCC compiler for manual compilation.

R packages which compile code written in other languages, such as C or Fortran, often use a Makevars options file to determine how that code is compiled. When you launch an R shell after loading an Apocrita R module, an appropriate opinionated Makevars file is loaded which has been heavily tested and will provide appropriate compilation flags for the majority of users.

Should you feel you need to change your Makevars arguments, please raise a ticket so we can help you.

Bioinformatics packages for R

Bioinformatics packages for R can be installed with the bioconductor script and then loaded like a normal R package. You can install a package using:

if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("<package_name>")

References