Difference between revisions of "R"

From UFRC
Jump to navigation Jump to search
m (Text replace - "hpc.ufl.edu" to "rc.ufl.edu")
Line 86: Line 86:
 
==Installed Libraries==
 
==Installed Libraries==
 
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
 
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.hpc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
+
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.rc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
 
{{:R_libraries}}
 
{{:R_libraries}}

Revision as of 18:12, 21 June 2017

Description

R website  

R is a free software environment for statistical computing and graphics.

Note: File a support ticket to request installation of additional libraries.

Required Modules

modules documentation

Serial

  • R

Parallel (MPI)

  • Rmpi

The "Rmpi" module enables access to the version of R that provides the Rmpi library for large-scale multi-node parallel computations.

See Rmpi documentation for details.

System Variables

  • HPC_{{#uppercase:R}}_DIR - installation directory
  • HPC_R_BIN - executable directory
  • HPC_R_LIB - library directory
  • HPC_R_INCLUDE - includes directory

How To Run

R can be run on the command-line (or the batch system) using the 'Rscript myscript.R' or 'R CMD BATCH myscript.R' command. For script development or visualization RStudio GUI application can be used. See the respective documentation for details. For RStudio, load the following modules within your job script before running the rstudio command:

module load gui rstudio
Notes and Warnings
  • Java

rJava users need to load the java module manually with 'module load java/1.7.0_79'

  • TMPDIR

If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

mkdir -p tmp
export TMPDIR=$(pwd)/tmp

in your job script to prevent this and launch your job from the respective directory and not from your home directory.

  • Tasks vs Cores for parallel runs

Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

See the single-threaded and multi-threaded examples on the Sample SLURM Scripts page for more details.


Performance

We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the R Benchmark 2.5 table

FAQ

  • Q: When I submit a job using 'parallel' package all threads seem to share a single CPU core instead of running on the separate cores I requested.
    • A: On SLURM you need to use --cpus-per-task to specify the number of available cores. E.g.
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12

will allow mcapply or other function from the 'parallel' package to run on all requested cores

Rmpi Example

See R MPI Example page for an example of using Rmpi code.

Installed Libraries

Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. File R_PACKAGES is missing.

Name Description