R: Difference between revisions

← Older edit

Latest revision as of 14:41, 20 September 2024

Description

R website

R is a free software environment for statistical computing and graphics.

Note: File a support ticket to request installation of additional libraries.

Environment Modules

Run module spider R to find out what environment modules are available for this application.

System Variables

HPC_R_DIR - installation directory
HPC_R_BIN - executable directory
HPC_R_LIB - library directory
HPC_R_INCLUDE - includes directory

How To Run

R can be run on the command-line (or the batch system) using the 'Rscript myscript.R' or 'R CMD BATCH myscript.R' command. For script development or visualization RStudio GUI application can be used. See the respective documentation for details. Alternatively an instance of RStudio Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.

Notes and Warnings

The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like

numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))

to find out the number of CPU cores 'X' requested in your job script by:

#SBATCH --cpus-per-task=X

Default RData format

In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.

Java

rJava users need to load the java module manually with 'module load java/1.7.0_79'

TMPDIR

If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

mkdir -p tmp
export TMPDIR=$(pwd)/tmp

in your job script to prevent this and launch your job from the respective directory and not from your home directory.

For users of PHI and FERPA: It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in /blue when working with R. Writing files to /home or $TMPDIR could expose restricted data to unauthorized users.

Tasks vs Cores for parallel runs

Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

See the single-threaded and multi-threaded examples on the Sample SLURM Scripts page for more details.

Job Script Examples

Expand this section to view example R script.

#!/bin/bash
#SBATCH --job-name=R_test   #Job name	
#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE   # Where to send mail	
#SBATCH --ntasks=1
#SBATCH --mem=1gb   # Per processor memory
#SBATCH --time=00:05:00   # Walltime
#SBATCH --output=r_job.%j.out   # Name output file 
#Record the time and compute node the job ran on
date; hostname; pwd
#Use modules to load the environment for R
module load R

#Run R script 
Rscript myRscript.R

date

Performance

We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the R Benchmark 2.5 table

Rmpi Example

See R MPI Example page for an example of using Rmpi code.

Installed Libraries

You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our Applications FAQ and see the section "How do I install R packages?".

Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:

mkdir ~/R/x86_64-pc-linux-gnu-library/4.3

You can set a custom library path with the R_LIBS_USER environment variable. From https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html:

"R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers."

To see a list of installed libraries in the currently loaded version of R:

$ R
> installed.packages()

Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.

Expand this section to view installed library list.

File R_PACKAGES is missing.

Name	Description

R: Difference between revisions

Latest revision as of 14:41, 20 September 2024

Contents

Description

Environment Modules

System Variables

How To Run

Job Script Examples

Performance

Rmpi Example

Installed Libraries

Navigation menu

@@ Line 1: / Line 1: @@
-__NOTOC__
 __NOEDITSECTION__
-[[Category:Software]][[Category:Statistics]]
+{|align=right
-<!-- ########  Template Configuration ######## -->
+  |__TOC__
-<!--Edit definitions of the variables used in template calls
+  |}
-Required variables:
+[[Category:Software]][[Category:Statistics]][[Category:Programming]]
-app - lowercase name of the application e.g. "amber"
+{|<!--Main settings - REQUIRED-->
-url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
-Optional variables:
-INTEL - Version of the Intel Compiler e.g. "11.1"
-MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
--->
-{|
-<!--Main settings - REQUIRED-->
 |{{#vardefine:app|R}}
 |{{#vardefine:url|http://www.r-project.org/}}
-<!--Compiler and MPI settings - OPTIONAL -->
+|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
-|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
-|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
-<!--Choose sections to enable - OPTIONAL-->
-|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
-|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
-|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
+|{{#vardefine:job|1}} <!--Enable job script wiki page link-->
 |{{#vardefine:policy|}} <!--Enable policy section -->
-|{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
+|{{#vardefine:testing|1}} <!--Enable performance testing/profiling section -->
-|{{#vardefine:faq|}} <!--Enable FAQ section -->
+|{{#vardefine:faq|1}} <!--Enable FAQ section -->
 |{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
 |}
@@ Line 31: / Line 18: @@
 <!--Description-->
 {{#if: {{#var: url}}|
-{{App_Description|app={{#var:app}}|url={{#var:url}}}}|}}
+{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 R is a free software environment for statistical computing and graphics.
-<!--Location-->
-{{App_Location|app={{#var:app}}|{{#var:ver}}}}
+'''Note: File a [http://support.rc.ufl.edu support ticket] to request installation of additional libraries.'''
-==Available versions==
+<!--Modules-->
-* 2.13.1
+==Environment Modules==
-<!-- -->
+Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
-{{#if: {{#var: mod}}|==Running the application using modules==
+==System Variables==
-{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
+* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 * HPC_R_BIN - executable directory
 * HPC_R_LIB - library directory
 * HPC_R_INCLUDE - includes directory
-==Installed Packages==
-'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library.
-<pre>
-affy                    Methods for Affymetrix Oligonucleotide Arrays
-affydata                Affymetrix Data for Demonstration Purpose
-affyio                  Tools for parsing Affymetrix data files
-affyPLM                 Methods for fitting probe-level models
-affyQCReport            QC Report Generation for affyBatch objects
-akima                   Interpolation of irregularly spaced data
-annaffy                 Annotation tools for Affymetrix biological
-                        metadata
-annotate                Annotation for microarrays
-AnnotationDbi           Annotation Database Interface
-base                    The R Base Package
-baySeq                  Empirical Bayesian analysis of patterns of
-                        differential expression in count data
-Biobase                 Biobase: Base functions for Bioconductor
-Biostrings              String objects representing biological
-                        sequences, and matching algorithms
-bitops                  Functions for Bitwise operations
-boot                    Bootstrap Functions (originally by Angelo Canty
-                        for S)
-class                   Functions for Classification
-cluster                 Cluster Analysis Extended Rousseeuw et al.
-codetools               Code Analysis Tools for R
-colorspace              Color Space Manipulation
-compiler                The R Compiler Package
-datasets                The R Datasets Package
-DBI                     R Database Interface
-DESeq                   Digital gene expresion analysis based on the
-                        negative binomial distribution
-digest                  Create cryptographic hash digests of R objects
-DynDoc                  Dynamic document tools
-edgeR                   Empirical analysis of digital gene expression
-                        data in R
-foreign                 Read Data Stored by Minitab, S, SAS, SPSS,
-                        Stata, Systat, dBase, ...
-gcrma                   Background Adjustment Using Sequence
-                        Information
-genefilter              genefilter: methods for filtering genes from
-                        microarray experiments
-geneplotter             Graphics related functions for Bioconductor
-GenomicRanges           Representation and manipulation of genomic
-                        intervals
-ggplot2                 An implementation of the Grammar of Graphics
-GO.db                   A set of annotation maps describing the entire
-                        Gene Ontology
-graphics                The R Graphics Package
-grDevices               The R Graphics Devices and Support for Colours
-                        and Fonts
-grid                    The Grid Graphics Package
-hgu95av2.db             Affymetrix Human Genome U95 Set annotation data
-                        (chip hgu95av2)
-HilbertVis              Hilbert curve visualization
-Hmisc                   Harrell Miscellaneous
-IRanges                 Infrastructure for manipulating intervals on
-                        sequences
-iterators               Iterator construct for R
-itertools               Iterator Tools
-KEGG.db                 A set of annotation maps for KEGG
-KernSmooth              Functions for kernel smoothing for Wand & Jones
-                        (1995)
-lattice                 Lattice Graphics
-leaps                   regression subset selection
-limma                   Linear Models for Microarray Data
-locfit                  Local Regression, Likelihood and Density
-                        Estimation.
-marray                  Exploratory analysis for two-color spotted
-                        microarray data
-MASS                    Support Functions and Datasets for Venables and
-                        Ripley's MASS
-Matrix                  Sparse and Dense Matrix Classes and Methods
-methods                 Formal Methods and Classes
-mgcv                    GAMs with GCV/AIC/REML smoothness estimation
-                        and GAMMs by PQL
-multtest                Resampling-based multiple hypothesis testing
-nlme                    Linear and Nonlinear Mixed Effects Models
-nnet                    Feed-forward Neural Networks and Multinomial
-                        Log-Linear Models
-org.Hs.eg.db            Genome wide annotation for Human
-plyr                    Tools for splitting, applying and combining
-                        data
-preprocessCore          A collection of pre-processing functions
-prettyR                 Pretty descriptive stats.
-proto                   Prototype object-based programming
-qvalue                  Q-value estimation for false discovery rate
-                        control
-RColorBrewer            ColorBrewer palettes
-reshape                 Flexibly reshape data.
-rpart                   Recursive Partitioning
-RSQLite                 SQLite interface for R
-Rtwalk                  Sampling from many objective functions
-Rwave                   Time-Frequency analysis of 1-D signals
-simpleaffy              Very simple high level analysis of Affymetrix
-                        data
-spatial                 Functions for Kriging and Point Pattern
-                        Analysis
-splines                 Regression Spline Functions and Classes
-statmod                 Statistical Modeling
-stats                   The R Stats Package
-stats4                  Statistical Functions using S4 Classes
-survival                Survival analysis, including penalised
-                        likelihood.
-tcltk                   Tcl/Tk Interface
-tools                   Tools for Package Development
-utils                   The R Utils Package
-vsn                     Variance stabilization and calibration for
-                        microarray data
-waveslim                Basic wavelet routines for one-, two- and
-                        three-dimensional signal processing
-wavethresh              Wavelets statistics and transforms.
-XML                     Tools for parsing and generating XML within R
-                        and S-Plus.
-xtable                  Export tables to LaTeX or HTML
-</pre>
 {{#if: {{#var: exe}}|==How To Run==
-WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
+R can be run on the command-line (or the batch system) using the '<code>Rscript myscript.R</code>' or '<code>R CMD BATCH myscript.R</code>' command. For script development or visualization RStudio GUI application can be used. See the [[GUI_Programs|respective documentation]] for details. Alternatively an instance of [[RStudio_Server|RStudio Server]] can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.
+;Notes and Warnings:
+* The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like
+ numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))
+to find out the number of CPU cores 'X' requested in your job script by:
+ #SBATCH --cpus-per-task=X
+* Default RData format
+In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.
+* Java
+rJava users need to load the java module manually with '<code>module load java/1.7.0_79</code>'
+* TMPDIR
+If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like
+ mkdir -p tmp
+ export TMPDIR=$(pwd)/tmp
+in your job script to prevent this and launch your job from the respective directory and not from your home directory.
+{{Note|'''For users of PHI and FERPA:''' It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in <code>/blue</code> when working with R. Writing files to <code>/home</code> or <code>$TMPDIR</code> could expose restricted data to unauthorized users.|warn}}
+* Tasks vs Cores for parallel runs
+Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:
+ #SBATCH --nodes=1
+ #SBATCH --ntasks=1
+ #SBATCH --cpus-per-task=8
+See the single-threaded and multi-threaded examples on the [[Sample SLURM Scripts]] page for more details.
+|}}
 {{#if: {{#var: conf}}|==Configuration==
 See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
-{{#if: {{#var: pbs}}|==PBS Script Examples==
+{{#if: {{#var: job}}|==Job Script Examples==
-See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
+<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
-{{#if: {{#var: policy}}|==Usage policy==
+''Expand this section to view example R script.''
+<div class="mw-collapsible-content" style="padding: 5px;">
+<source lang=bash>
+#!/bin/bash
+#SBATCH --job-name=R_test   #Job name
+#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
+#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE   # Where to send mail
+#SBATCH --ntasks=1
+#SBATCH --mem=1gb   # Per processor memory
+#SBATCH --time=00:05:00   # Walltime
+#SBATCH --output=r_job.%j.out   # Name output file
+#Record the time and compute node the job ran on
+date; hostname; pwd
+#Use modules to load the environment for R
+module load R
+#Run R script
+Rscript myRscript.R
+date
+</source></div></div>
+|}}
+{{#if: {{#var: policy}}|==Usage Policy==
 WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 {{#if: {{#var: testing}}|==Performance==
-WRITE PERFORMANCE TESTING RESULTS HERE|}}
+We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the [[R Benchmark 2.5]] table |}}
-{{#if: {{#var: faq}}|==FAQ==
-*'''Q:''' **'''A:'''|}}
 {{#if: {{#var: citation}}|==Citation==
 If you publish research that uses {{{app}}} you have to cite it as follows:
 WRITE CITATION HERE
 |}}
+==Rmpi Example==
+See [[R MPI Example]] page for an example of using Rmpi code.
+==Installed Libraries==
+You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our [[Applications FAQ]] and see the section "How do I install R packages?".
+Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:
+ mkdir ~/R/x86_64-pc-linux-gnu-library/4.3
+You can set a custom library path with the R_LIBS_USER environment variable.
+From [https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html]:
+"R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers."
+To see a list of installed libraries in the currently loaded version of R:
+<pre>
+$ R
+> installed.packages()
+</pre>
+'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
+<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.rc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
+<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
+''Expand this section to view installed library list.''
+<div class="mw-collapsible-content" style="padding: 5px;">
+{{:R_libraries}}
+</div>
+</div>

R: Difference between revisions

Latest revision as of 14:41, 20 September 2024

Description

Environment Modules

System Variables

How To Run

Job Script Examples

Performance

Rmpi Example

Installed Libraries

Navigation menu

Search