R¶
R is an interpreted computer language for statistical computing and graphics including: linear and generalised linear models, non-linear regression models, time series analysis, classical parametric and non-parametric tests.
For efficiency, procedures written in the C, C++, or Fortran languages may be interfaced within R in addition to the user-visible R functions.
R is available as a module on Apocrita.
R can also be used via Conda, please see this blog post for more information.
For interactive graphical R usage in RStudio, please see our separate documentation for RStudio via Open OnDemand.
Usage¶
Additional libraries via Spack
The Apocrita R module will also activate a Spack environment (link to follow), which contains many of the most common additional libraries (such as HDF5, GDAL, GEOS, PROJ and many more) required to install popular R packages into your personal R library. See here (link to follow) for a full list of packages and libraries contained in each Spack environment.
To run the default installed version of R, simply load the R
module:
$ module load R
Activating R shared library Spack environment, please wait...
Loading R/X.Y.Z
Loading requirement: spack/x.y.z
[r-libraries-x.y.z] $ R --help
Usage: R [options] [< infile] [> outfile]
[r-libraries-x.y.z]
is the name of the Spack environment that has been
activated alongside the R module load.
For further usage documentation, see the full output of R --help
or
Rscript --help
.
R can be executed through the 'R CMD' interactive interface using R
or via
the Rscript
non-interactive interface. The Rscript
binary provides an
alternative front end to the legacy R CMD BATCH
method to run R commands
in a non-interactive shell.
Example jobs¶
Do not use detectCores()
for makeCluster()
detectCores()
is not cluster aware and will incorrectly return all cores
on a machine even if they are not actually available.
The documentation for R covers this:
"This is not suitable for use directly for ... specifying the number of cores in
makeCluster
.
First because it may return NA, second because it does not give the number of allowed cores"
We advise specifying the number of cores directly, using the
NSLOTS
UGE variable:
library(parallel); cluster <- makeCluster(as.integer(Sys.getenv('NSLOTS', '1')))
Serial job¶
Here is an example job running on 2 cores and 4GB total memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 2
#$ -l h_rt=1:0:0
#$ -l h_vmem=2G
module load R
Rscript myscript.R
Using Rscript
within Apocrita jobs
When running R jobs on Apocrita, use Rscript
to submit a file containing
R commands; R output will be written to the stdout
and stderror
streams.
Additional R Packages¶
Think twice before using renv
!
The renv
project manager has
become popular, but we strongly advise users to think twice before using it
on Apocrita. It aggressively caches existing binaries, but sometimes the
cached binaries for personal installs aren't compatible with all
environments and can cause issues.
Additionally, renv
also enables the "use the Posit Public Package Manager
by default?"
setting
when R projects are initialised by renv::init()
. Whilst you may think that
the "Rocky 9" packages from PPM would make life simpler, you may run into
compatibility issues when running R on Apocrita, hence why we do not
recommend this method.
We use Spack environments and modules to provide additional software on Apocrita, and tend to compile almost everything from source. The PPM provides binaries based on dependencies from Rocky 9's software repositories, which we will often not be using actually - we will be using manually compiled versions instead. This is likely to lead to version mismatches and errors.
If you still choose to use renv
on Apocrita despite this, the ITSR team
can't really offer any support in doing so. There are details for using
alternative locations for your libraries on Apocrita
below.
Additional packages can be installed to extend the capabilities of R using the
install.packages()
command. Personally installed packages are stored in libraries.
Personal R libraries¶
By default, personal packages will be installed into:
/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>
If this doesn't exist the first time you run an install.packages()
command,
the interactive R
shell will offer to create it and ask for confirmation:
Warning in install.packages("<package_name>") :
'lib = "/opt/R/<VERSION>/lib/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
'/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>'
to install packages into? (yes/No/cancel) yes
The Rscript
non-interactive R command won't offer to create a personal library
for you. If you run an R script via the Rscript
command and it can't find a
writable personal library location, the script will fail. You can overcome this
either by installing a simple package in an interactive R
shell prior to
running the Rscript
, or creating a personal library location in advance
before submitting any script, e.g.:
mkdir -p ${HOME}/R/x86_64-pc-linux-gnu-library/<VERSION>
To display the list of active libraries, execute:
> .libPaths()
[1] "/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>"
[2] "/opt/R/<VERSION>/lib/R/library"
Customising personal library location¶
Set your custom library path in each new session or job script
R will always default to
/data/home/abc123/R/x86_64-pc-linux-gnu-library/<VERSION>
every time it is
launched. Make sure you manually set .libPaths()
at the start of every job
script or R session if you wish to use an alternative personal library
location.
You may wish to store your library in an alternative location such as in your Research Group storage space to allow collaboration with others. You may also wish to have multiple personal libraries.
You can change your library location using the .libPaths()
command. Note, in
this case, the directory you wish to use must already exist:
.libPaths("/custom/path/to/library")
You can check the new path has been set:
> .libPaths()
[1] "/custom/path/to/library" "/opt/R/4.4.1/lib/R/library"
All subsequent package installations in that session will then be written to this new location.
Displaying loaded packages¶
To display the list of currently loaded packages (including default packages), execute:
> (.packages())
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
To display all packages available in the currently loaded libraries, execute:
> .packages(all.available = TRUE)
[1] "base" "boot" "class" "cluster" "codetools"
[6] "compiler" "datasets" "foreign" "graphics" "grDevices"
...
To install additional R packages, execute:
> install.packages("<package_name>")
Packages which are not included in the R main library will be added to the personal library.
To load an R package, execute:
> library("<package_name>")
To find the location where a package is installed, execute:
> find.package("<package_name>")
[1] "/path/to/package"
R packages with external dependencies¶
Installing an R package will automatically install other R packages on which that package depends. However, R packages often depend on software packages provided outside R, such as C libraries. These packages may be provided by the automatically activated Spack environment as detailed above.
For example, the library gdal
is required to build the rgdal
R package and
the library gmp
is required for several numerical R packages. Both are
included as part of the Spack environment.
Should you find anything is missing from the Spack environment, please raise a ticket with us - we can either look into adding it into a future version of the Spack environment, or give you instructions for using an additional module to provide it.
R packages using code in other languages¶
The R Spack environment will provide an appropriate version of the GCC compiler for manual compilation.
R packages which compile code written in other languages, such as C or Fortran,
often use a Makevars
options file to determine how that code is compiled. When
you launch an R shell after loading an Apocrita R module, an appropriate
opinionated Makevars
file is loaded which has been heavily tested and will
provide appropriate compilation flags for the majority of users.
Should you feel you need to change your Makevars
arguments, please raise a
ticket so we can help you.
Bioinformatics packages for R¶
Bioinformatics packages for R can be installed with the bioconductor script and then loaded like a normal R package. You can install a package using:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("<package_name>")