R¶
R is an interpreted computer language for statistical computing and graphics including: linear and generalised linear models, non-linear regression models, time series analysis, classical parametric and non-parametric tests.
For efficiency, procedures written in the C, C++, or Fortran languages may be interfaced within R in addition to the user-visible R functions.
R is available as a module on Apocrita.
Usage¶
To run the default installed version of R, simply load the R
module:
$ module load R
$ R --help
Usage: R [options] [< infile] [> outfile]
or: R CMD command [arguments]
$ Rscript --help
Usage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]
For further usage documentation, see the full output of R --help
or the
manual pages: man R
and man Rscript
.
R can be executed through the 'R CMD' interactive interface using R
or via
the Rscript
non-interactive interface. The Rscript
binary provides an
alternative front end to the legacy R CMD BATCH
method to run R commands
in a non-interactive shell.
Example jobs¶
Do not use detectCores()
for makeCluster()
detectCores()
is not cluster aware and will incorrectly return all cores
on a machine even if they are not actually available.
The documentation for R covers this:
"This is not suitable for use directly for ... specifying the number of cores in
makeCluster
.
First because it may return NA, second because it does not give the number of allowed cores"
We advise specifying the number of cores directly, using the
NSLOTS
UGE variable:
library(parallel); cluster <- makeCluster(as.integer(Sys.getenv('NSLOTS', '1')))
Serial job¶
Here is an example job running on 2 cores and 4GB total memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load R
Rscript myscript.R
Using Rscript
within Apocrita jobs
When running R jobs on Apocrita, use Rscript
to submit a file containing
R commands; R output will be written to the stdout
and stderror
streams.
Additional R Packages¶
Additional packages can be installed to extend the capabilities of R; packages are stored in libraries.
To display the list of active libraries, execute:
> .libPaths()
[1] "/share/apps/centos7/R/<VERSION>/lib64/R/library"
To display the list of currently loaded packages (including default packages), execute:
> (.packages())
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
To display all packages available in the currently loaded libraries, execute:
> .packages(all.available = TRUE)
[1] "base" "boot" "class" "cluster" "codetools"
[6] "compiler" "datasets" "foreign" "graphics" "grDevices"
...
To install additional R packages, execute:
> install.packages("<package_name>")
Packages which are not included in the R main library will be added to the personal library.
Creating a personal library
If a personal library has not already been created, respond with y
when
prompted to create and use a personal library. Also select your closest
Secure CRAN mirror.
To load an R package, execute:
> library("<package_name>")
To find the location where a package is installed, execute:
> find.package("<package_name>")
[1] "/path/to/package"
R packages with external dependencies¶
Installing an R package will automatically install other R packages on which that package depends. However, R packages often depend on software packages provided outside R, such as C libraries. These packages may be provided by a module on Apocrita instead of being available by default. To access these packages be sure to load the Apocrita module before starting your R session.
For example, the module gdal
is required to build the rgdal
R package and
the module gmp
is required for several numerical R packages.
R packages using code in other languages¶
If a package requires a non-default compiler to build then you can load a
module like gcc/10.2.0
.
R packages which compile code written in other languages, such as C or Fortran,
often use a Makevars
options file to determine how that code is compiled. If
you wish to tailor the building of packages to suit your needs you can create
your own Makevars
file. You can use a personal Makevars
file in a specific
session by setting the environment variable R_MAKEVARS_USER
to point to the
file.
Error: C++14 standard requested but CXX14 is not defined
Some R packages rely on a C++ compiler which supports the C++ 14 language
revision. If package building fails with an error about "CXX14
" then you
can add the following line to the Makevars
file in use, to compile with
g++
:
CXX14=g++ -std=c++14 -fPIC
You will also need to load a compiler module, such as gcc/7.1.0
, to
use a compiler which supports this language revision. You should load
this, and any other module, before starting your R session.
Error: C++17 standard requested but CXX17 is not defined
Similar to the issue above, some R packages rely on a C++ compiler which
supports the C++ 17 language revision. If package building fails with an
error about "CXX17
" then you can add the following line to the Makevars
file in use, to compile with g++
:
CXX17=g++ -std=c++17 -fPIC
You will also need to load a compiler module, such as gcc/10.2.0
, to
use a compiler which supports this language revision. You should load
this, and any other module, before starting your R session.
Bioinformatics packages for R¶
Bioinformatics packages for R can be installed with the bioconductor script and then loaded like a normal R package. For versions of R starting with 3.6.1 you can install a package using:
# Installing bioinformatics packages for R >=3.6.1
install.packages("BiocManager")
BiocManager::install("<package_name>")
For some earlier releases of R this method does not work and you should instead use:
# Installing bioinformatics packages for R <3.6.1
source("https://bioconductor.org/biocLite.R")
biocLite("<package_name>")