R¶

R is an interpreted computer language for statistical computing and graphics including: linear and generalised linear models, non-linear regression models, time series analysis, classical parametric and non-parametric tests.

For efficiency, procedures written in the C, C++, or Fortran languages may be interfaced within R in addition to the user-visible R functions.

R is available as a module on Apocrita.

Usage¶

To run the default installed version of R, simply load the R module:

$ module load R
$ R --help

Usage: R [options] [< infile] [> outfile]
   or: R CMD command [arguments]

$ Rscript --help
Usage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]

For further usage documentation, see the full output of R --help or the manual pages: man R and man Rscript.

R can be executed through the 'R CMD' interactive interface using R or via the Rscript non-interactive interface. The Rscript binary provides an alternative front end to the legacy R CMD BATCH method to run R commands in a non-interactive shell.

Example jobs¶

Do not use detectCores() for makeCluster()

detectCores() is not cluster aware and will incorrectly return all cores on a machine even if they are not actually available.

The documentation for R covers this:

"This is not suitable for use directly for ... specifying the number of cores in makeCluster.
First because it may return NA, second because it does not give the number of allowed cores"

We advise specifying the number of cores directly, using the NSLOTS UGE variable:

library(parallel); cluster <- makeCluster(as.integer(Sys.getenv('NSLOTS', '1')))

Serial job¶

Here is an example job running on 2 cores and 4GB total memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G

module load R
Rscript myscript.R

Using Rscript within Apocrita jobs

When running R jobs on Apocrita, use Rscript to submit a file containing R commands; R output will be written to the stdout and stderror streams.

Additional R Packages¶

Additional packages can be installed to extend the capabilities of R; packages are stored in libraries.

To display the list of active libraries, execute:

> .libPaths()
[1] "/share/apps/centos7/R/<VERSION>/lib64/R/library"

To display the list of currently loaded packages (including default packages), execute:

> (.packages())
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"

To display all packages available in the currently loaded libraries, execute:

> .packages(all.available = TRUE)
 [1] "base"       "boot"       "class"      "cluster"    "codetools"
 [6] "compiler"   "datasets"   "foreign"    "graphics"   "grDevices"
...

To install additional R packages, execute:

> install.packages("<package_name>")

Packages which are not included in the R main library will be added to the personal library.

Creating a personal library

If a personal library has not already been created, respond with y when prompted to create and use a personal library. Also select your closest Secure CRAN mirror.

To load an R package, execute:

> library("<package_name>")

To find the location where a package is installed, execute:

> find.package("<package_name>")
[1] "/path/to/package"

R packages with external dependencies¶

Installing an R package will automatically install other R packages on which that package depends. However, R packages often depend on software packages provided outside R, such as C libraries. These packages may be provided by a module on Apocrita instead of being available by default. To access these packages be sure to load the Apocrita module before starting your R session.

For example, the module gdal is required to build the rgdal R package and the module gmp is required for several numerical R packages.

R packages using code in other languages¶

If a package requires a non-default compiler to build then you can load a module like gcc/10.2.0.

R packages which compile code written in other languages, such as C or Fortran, often use a Makevars options file to determine how that code is compiled. If you wish to tailor the building of packages to suit your needs you can create your own Makevars file. You can use a personal Makevars file in a specific session by setting the environment variable R_MAKEVARS_USER to point to the file.

Error: C++14 standard requested but CXX14 is not defined

Some R packages rely on a C++ compiler which supports the C++ 14 language revision. If package building fails with an error about "CXX14" then you can add the following line to the Makevars file in use, to compile with g++:

CXX14=g++ -std=c++14 -fPIC

You will also need to load a compiler module, such as gcc/7.1.0, to use a compiler which supports this language revision. You should load this, and any other module, before starting your R session.

Error: C++17 standard requested but CXX17 is not defined

Similar to the issue above, some R packages rely on a C++ compiler which supports the C++ 17 language revision. If package building fails with an error about "CXX17" then you can add the following line to the Makevars file in use, to compile with g++:

CXX17=g++ -std=c++17 -fPIC

You will also need to load a compiler module, such as gcc/10.2.0, to use a compiler which supports this language revision. You should load this, and any other module, before starting your R session.

Bioinformatics packages for R¶

Bioinformatics packages for R can be installed with the bioconductor script and then loaded like a normal R package. For versions of R starting with 3.6.1 you can install a package using:

# Installing bioinformatics packages for R >=3.6.1
install.packages("BiocManager")
BiocManager::install("<package_name>")

For some earlier releases of R this method does not work and you should instead use:

# Installing bioinformatics packages for R <3.6.1
source("https://bioconductor.org/biocLite.R")
biocLite("<package_name>")

R¶