Difference between revisions of "R"
Line 76: | Line 76: | ||
will allow mcapply or other function from the 'parallel' package to run on all requested cores | will allow mcapply or other function from the 'parallel' package to run on all requested cores | ||
+ | |||
+ | *'''Q:''' How do I install R packages? | ||
+ | **'''A:''' Users can install R packages in their local directory. The default directory is /home/my.username/R/x86_64-pc-linux-gnu-library/X.X/ (X.X = version number) | ||
+ | From a tarball: | ||
+ | <pre># from a standard repository | ||
+ | $ module load R/X.X | ||
+ | $ R | ||
+ | > install.packages("PACKAGE") | ||
+ | if coming from github | ||
+ | > devtools::install_github("author/software") | ||
+ | |||
+ | # from a non-standard repository (tarball available) | ||
+ | $ module load R/X.X | ||
+ | |||
+ | $ R CMD INSTALL /path/to/package_version.tar.gz</pre> | ||
+ | |||
|}} | |}} | ||
{{#if: {{#var: citation}}|==Citation== | {{#if: {{#var: citation}}|==Citation== |
Revision as of 21:59, 6 October 2022
Description
R is a free software environment for statistical computing and graphics.
Note: File a support ticket to request installation of additional libraries.
Environment Modules
Run module spider R
to find out what environment modules are available for this application.
System Variables
- HPC_R_DIR - installation directory
- HPC_R_BIN - executable directory
- HPC_R_LIB - library directory
- HPC_R_INCLUDE - includes directory
How To Run
R can be run on the command-line (or the batch system) using the 'Rscript myscript.R
' or 'R CMD BATCH myscript.R
' command. For script development or visualization RStudio GUI application can be used. See the respective documentation for details. Alternatively an instance of RStudio Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.
- Notes and Warnings
- The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like
numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))
to find out the number of CPU cores 'X' requested by
#SBATCH --cpus-per-task=X
in your job script.
- Default RData format
In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.
- Java
rJava users need to load the java module manually with 'module load java/1.7.0_79
'
- TMPDIR
If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like
mkdir -p tmp export TMPDIR=$(pwd)/tmp
in your job script to prevent this and launch your job from the respective directory and not from your home directory.
- Tasks vs Cores for parallel runs
Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:
#SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8
See the single-threaded and multi-threaded examples on the Sample SLURM Scripts page for more details.
Job Script Examples
See the R_Job_Script page for R job script examples.
Performance
We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the R Benchmark 2.5 table
FAQ
- Q: When I submit a job using 'parallel' package all threads seem to share a single CPU core instead of running on the separate cores I requested.
- A: On SLURM you need to use --cpus-per-task to specify the number of available cores. E.g.
#SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=12
will allow mcapply or other function from the 'parallel' package to run on all requested cores
- Q: How do I install R packages?
- A: Users can install R packages in their local directory. The default directory is /home/my.username/R/x86_64-pc-linux-gnu-library/X.X/ (X.X = version number)
From a tarball:
# from a standard repository $ module load R/X.X $ R > install.packages("PACKAGE") if coming from github > devtools::install_github("author/software") # from a non-standard repository (tarball available) $ module load R/X.X $ R CMD INSTALL /path/to/package_version.tar.gz
Rmpi Example
See R MPI Example page for an example of using Rmpi code.
Installed Libraries
Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. File R_PACKAGES is missing.
Name | Description |
---|