Difference between revisions of "R"
Moskalenko (talk | contribs) |
Moskalenko (talk | contribs) |
||
(21 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
__NOEDITSECTION__ | __NOEDITSECTION__ | ||
− | [[Category:Software]][[Category:Statistics]] | + | {|align=right |
+ | |__TOC__ | ||
+ | |} | ||
+ | [[Category:Software]][[Category:Statistics]][[Category:Programming]] | ||
{|<!--Main settings - REQUIRED--> | {|<!--Main settings - REQUIRED--> | ||
|{{#vardefine:app|R}} | |{{#vardefine:app|R}} | ||
Line 24: | Line 27: | ||
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | ||
==System Variables== | ==System Variables== | ||
− | * HPC_{{ | + | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory |
* HPC_R_BIN - executable directory | * HPC_R_BIN - executable directory | ||
* HPC_R_LIB - library directory | * HPC_R_LIB - library directory | ||
* HPC_R_INCLUDE - includes directory | * HPC_R_INCLUDE - includes directory | ||
{{#if: {{#var: exe}}|==How To Run== | {{#if: {{#var: exe}}|==How To Run== | ||
− | R can be run on the command-line (or the batch system) using the '<code>Rscript myscript.R</code>' or '<code>R CMD BATCH myscript.R</code>' command. For script development or visualization RStudio GUI application can be used. See the [[GUI_Programs|respective documentation]] for details. | + | R can be run on the command-line (or the batch system) using the '<code>Rscript myscript.R</code>' or '<code>R CMD BATCH myscript.R</code>' command. For script development or visualization RStudio GUI application can be used. See the [[GUI_Programs|respective documentation]] for details. Alternatively an instance of [[RStudio_Server|RStudio Server]] can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer. |
;Notes and Warnings: | ;Notes and Warnings: | ||
* The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like | * The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like | ||
− | numCores = Sys.getenv("SLURM_CPUS_ON_NODE") | + | numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE")) |
− | to find out the number of CPU cores 'X' requested by | + | to find out the number of CPU cores 'X' requested in your job script by: |
#SBATCH --cpus-per-task=X | #SBATCH --cpus-per-task=X | ||
− | |||
* Default RData format | * Default RData format | ||
Line 49: | Line 51: | ||
export TMPDIR=$(pwd)/tmp | export TMPDIR=$(pwd)/tmp | ||
in your job script to prevent this and launch your job from the respective directory and not from your home directory. | in your job script to prevent this and launch your job from the respective directory and not from your home directory. | ||
+ | |||
+ | {{Note|'''For users of PHI and FERPA:''' It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in <code>/blue</code> when working with R. Writing files to <code>/home</code> or <code>$TMPDIR</code> could expose restricted data to unauthorized users.|warn}} | ||
+ | |||
* Tasks vs Cores for parallel runs | * Tasks vs Cores for parallel runs | ||
Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script: | Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script: | ||
Line 60: | Line 65: | ||
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}} | See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}} | ||
{{#if: {{#var: job}}|==Job Script Examples== | {{#if: {{#var: job}}|==Job Script Examples== | ||
− | + | <div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;"> | |
+ | ''Expand this section to view example R script.'' | ||
+ | <div class="mw-collapsible-content" style="padding: 5px;"> | ||
+ | <source lang=bash> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=R_test #Job name | ||
+ | #SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) | ||
+ | #SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --mem=1gb # Per processor memory | ||
+ | #SBATCH --time=00:05:00 # Walltime | ||
+ | #SBATCH --output=r_job.%j.out # Name output file | ||
+ | #Record the time and compute node the job ran on | ||
+ | date; hostname; pwd | ||
+ | #Use modules to load the environment for R | ||
+ | module load R | ||
+ | |||
+ | #Run R script | ||
+ | Rscript myRscript.R | ||
+ | |||
+ | date | ||
+ | </source></div></div> | ||
+ | |}} | ||
{{#if: {{#var: policy}}|==Usage Policy== | {{#if: {{#var: policy}}|==Usage Policy== | ||
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}} | WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}} | ||
{{#if: {{#var: testing}}|==Performance== | {{#if: {{#var: testing}}|==Performance== | ||
− | We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the [[R Benchmark 2.5]] table | + | We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the [[R Benchmark 2.5]] table |}} |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
{{#if: {{#var: citation}}|==Citation== | {{#if: {{#var: citation}}|==Citation== | ||
If you publish research that uses {{{app}}} you have to cite it as follows: | If you publish research that uses {{{app}}} you have to cite it as follows: | ||
Line 85: | Line 100: | ||
==Installed Libraries== | ==Installed Libraries== | ||
+ | You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our [[Applications FAQ]] and see the section "How do I install R packages?". | ||
+ | |||
+ | Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package: | ||
+ | mkdir ~/R/x86_64-pc-linux-gnu-library/4.3 | ||
+ | |||
+ | You can set a custom library path with the R_LIBS_USER environment variable. | ||
+ | From [https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html]: | ||
+ | |||
+ | "R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers." | ||
+ | |||
+ | To see a list of installed libraries in the currently loaded version of R: | ||
+ | <pre> | ||
+ | $ R | ||
+ | > installed.packages() | ||
+ | </pre> | ||
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. | '''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. | ||
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.rc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. --> | <!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.rc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. --> | ||
+ | <div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;"> | ||
+ | ''Expand this section to view installed library list.'' | ||
+ | <div class="mw-collapsible-content" style="padding: 5px;"> | ||
{{:R_libraries}} | {{:R_libraries}} | ||
+ | </div> | ||
+ | </div> |
Latest revision as of 14:41, 20 September 2024
Description
R is a free software environment for statistical computing and graphics.
Note: File a support ticket to request installation of additional libraries.
Environment Modules
Run module spider R
to find out what environment modules are available for this application.
System Variables
- HPC_R_DIR - installation directory
- HPC_R_BIN - executable directory
- HPC_R_LIB - library directory
- HPC_R_INCLUDE - includes directory
How To Run
R can be run on the command-line (or the batch system) using the 'Rscript myscript.R
' or 'R CMD BATCH myscript.R
' command. For script development or visualization RStudio GUI application can be used. See the respective documentation for details. Alternatively an instance of RStudio Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.
- Notes and Warnings
- The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like
numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))
to find out the number of CPU cores 'X' requested in your job script by:
#SBATCH --cpus-per-task=X
- Default RData format
In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.
- Java
rJava users need to load the java module manually with 'module load java/1.7.0_79
'
- TMPDIR
If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like
mkdir -p tmp export TMPDIR=$(pwd)/tmp
in your job script to prevent this and launch your job from the respective directory and not from your home directory.
/blue
when working with R. Writing files to /home
or $TMPDIR
could expose restricted data to unauthorized users.- Tasks vs Cores for parallel runs
Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:
#SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8
See the single-threaded and multi-threaded examples on the Sample SLURM Scripts page for more details.
Job Script Examples
Expand this section to view example R script.
#!/bin/bash
#SBATCH --job-name=R_test #Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail
#SBATCH --ntasks=1
#SBATCH --mem=1gb # Per processor memory
#SBATCH --time=00:05:00 # Walltime
#SBATCH --output=r_job.%j.out # Name output file
#Record the time and compute node the job ran on
date; hostname; pwd
#Use modules to load the environment for R
module load R
#Run R script
Rscript myRscript.R
date
Performance
We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the R Benchmark 2.5 table
Rmpi Example
See R MPI Example page for an example of using Rmpi code.
Installed Libraries
You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our Applications FAQ and see the section "How do I install R packages?".
Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:
mkdir ~/R/x86_64-pc-linux-gnu-library/4.3
You can set a custom library path with the R_LIBS_USER environment variable. From https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html:
"R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers."
To see a list of installed libraries in the currently loaded version of R:
$ R > installed.packages()
Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
Expand this section to view installed library list.
File R_PACKAGES is missing.
Name | Description |
---|