Difference between revisions of "R"

From UFRC
Jump to navigation Jump to search
 
(77 intermediate revisions by 6 users not shown)
Line 1: Line 1:
__NOTOC__
 
 
__NOEDITSECTION__
 
__NOEDITSECTION__
[[Category:Software]][[Category:Statistics]]
+
{|align=right
<!-- ########  Template Configuration ######## -->
+
  |__TOC__
<!--Edit definitions of the variables used in template calls
+
  |}
Required variables:
+
[[Category:Software]][[Category:Statistics]][[Category:Programming]]
app - lowercase name of the application e.g. "amber"
+
{|<!--Main settings - REQUIRED-->
url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
 
Optional variables:
 
INTEL - Version of the Intel Compiler e.g. "11.1"
 
MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
 
-->
 
{|
 
<!--Main settings - REQUIRED-->
 
 
|{{#vardefine:app|R}}
 
|{{#vardefine:app|R}}
 
|{{#vardefine:url|http://www.r-project.org/}}
 
|{{#vardefine:url|http://www.r-project.org/}}
<!--Compiler and MPI settings - OPTIONAL -->
+
|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
 
|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
 
<!--Choose sections to enable - OPTIONAL-->
 
|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
 
|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 
 
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
 
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
+
|{{#vardefine:job|1}} <!--Enable job script wiki page link-->
 
|{{#vardefine:policy|}} <!--Enable policy section -->
 
|{{#vardefine:policy|}} <!--Enable policy section -->
|{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
+
|{{#vardefine:testing|1}} <!--Enable performance testing/profiling section -->
 
|{{#vardefine:faq|1}} <!--Enable FAQ section -->
 
|{{#vardefine:faq|1}} <!--Enable FAQ section -->
 
|{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
 
|{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
Line 31: Line 18:
 
<!--Description-->
 
<!--Description-->
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
{{App_Description|app={{#var:app}}|url={{#var:url}}}}|}}
+
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 +
 
 
R is a free software environment for statistical computing and graphics.
 
R is a free software environment for statistical computing and graphics.
<!--Location-->
+
 
{{App_Location|app={{#var:app}}|{{#var:ver}}}}
+
'''Note: File a [http://support.rc.ufl.edu support ticket] to request installation of additional libraries.'''
==Available versions==
+
<!--Modules-->
'''Note: File a [http://support.hpc.ufl.edu support ticket] to request installation of additional libraries.'''
+
==Environment Modules==
* 2.14.1-mpi - R base package MPI-enabled via the Rmpi library.
+
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
* 2.14.2
+
==System Variables==
* 2.15.0 (default)
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
<!-- -->
 
{{#if: {{#var: mod}}|==Running the application using modules==
 
{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
 
 
* HPC_R_BIN - executable directory
 
* HPC_R_BIN - executable directory
 
* HPC_R_LIB - library directory
 
* HPC_R_LIB - library directory
 
* HPC_R_INCLUDE - includes directory
 
* HPC_R_INCLUDE - includes directory
 +
{{#if: {{#var: exe}}|==How To Run==
 +
R can be run on the command-line (or the batch system) using the '<code>Rscript myscript.R</code>' or '<code>R CMD BATCH myscript.R</code>' command. For script development or visualization RStudio GUI application can be used. See the [[GUI_Programs|respective documentation]] for details. Alternatively an instance of [[RStudio_Server|RStudio Server]] can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.
 +
;Notes and Warnings:
 +
 +
* The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like
 +
numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))
 +
to find out the number of CPU cores 'X' requested in your job script by:
 +
#SBATCH --cpus-per-task=X
 +
 +
* Default RData format
 +
In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.
 +
 +
* Java
 +
rJava users need to load the java module manually with '<code>module load java/1.7.0_79</code>'
 +
 +
* TMPDIR
 +
If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like
 +
mkdir -p tmp
 +
export TMPDIR=$(pwd)/tmp
 +
in your job script to prevent this and launch your job from the respective directory and not from your home directory.
 +
 +
{{Note|'''For users of PHI and FERPA:''' It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in <code>/blue</code> when working with R. Writing files to <code>/home</code> or <code>$TMPDIR</code> could expose restricted data to unauthorized users.|warn}}
 +
 +
* Tasks vs Cores for parallel runs
 +
Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=8
  
To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:
+
See the single-threaded and multi-threaded examples on the [[Sample SLURM Scripts]] page for more details.
module load intel/11.1 openmpi/1.4.3 R
+
|}}
==Installed Packages==
 
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
 
<pre>
 
affy                    Methods for Affymetrix Oligonucleotide Arrays
 
affydata                Affymetrix Data for Demonstration Purpose
 
affyio                  Tools for parsing Affymetrix data files
 
affyPLM                Methods for fitting probe-level models
 
affyQCReport            QC Report Generation for affyBatch objects
 
akima                  Interpolation of irregularly spaced data
 
annaffy                Annotation tools for Affymetrix biological
 
                        metadata
 
annotate                Annotation for microarrays
 
AnnotationDbi          Annotation Database Interface
 
ape                    Analyses of Phylogenetics and Evolution
 
base                    The R Base Package
 
baySeq                  Empirical Bayesian analysis of patterns of
 
                        differential expression in count data
 
Biobase                Biobase: Base functions for Bioconductor
 
BiocGenerics            Generic functions for Bioconductor
 
BiocInstaller          Install/Update Bioconductor and CRAN Packages
 
Biostrings              String objects representing biological
 
                        sequences, and matching algorithms
 
bitops                  Functions for Bitwise operations
 
boot                    Bootstrap Functions (originally by Angelo Canty
 
                        for S)
 
caTools                Tools: moving window statistics, GIF, Base64,
 
                        ROC AUC, etc.
 
class                  Functions for Classification
 
cluster                Cluster Analysis Extended Rousseeuw et al.
 
CNVtools                A package to test genetic association with CNV
 
                        data
 
codetools              Code Analysis Tools for R
 
colorspace              Color Space Manipulation
 
compiler                The R Compiler Package
 
datasets                The R Datasets Package
 
DBI                    R Database Interface
 
DESeq                  Differential gene expression analysis based on
 
                        the negative binomial distribution
 
dichromat              Color schemes for dichromats
 
digest                  Create cryptographic hash digests of R objects
 
doMC                    Foreach parallel adaptor for the multicore
 
                        package
 
doSNOW                  Foreach parallel adaptor for the snow package
 
DynDoc                  Dynamic document tools
 
edgeR                  Empirical analysis of digital gene expression
 
                        data in R
 
foreach                Foreach looping construct for R
 
foreign                Read Data Stored by Minitab, S, SAS, SPSS,
 
                        Stata, Systat, dBase, ...
 
gcrma                  Background Adjustment Using Sequence
 
                        Information
 
gdata                  Various R programming tools for data
 
                        manipulation
 
gee                    Generalized Estimation Equation solver
 
geiger                  Analysis of evolutionary diversification
 
genefilter              genefilter: methods for filtering genes from
 
                        microarray experiments
 
geneplotter            Graphics related functions for Bioconductor
 
GenomicRanges          Representation and manipulation of genomic
 
                        intervals
 
ggplot2                An implementation of the Grammar of Graphics
 
glmmADMB                Generalized Linear Mixed Models using AD Model
 
                        Builder
 
GO.db                  A set of annotation maps describing the entire
 
                        Gene Ontology
 
gplots                  Various R programming tools for plotting data
 
graphics                The R Graphics Package
 
grDevices              The R Graphics Devices and Support for Colours
 
                        and Fonts
 
grid                    The Grid Graphics Package
 
gtools                  Various R programming tools
 
hacks                  Convenient R Functions
 
hgu95av2.db            Affymetrix Human Genome U95 Set annotation data
 
                        (chip hgu95av2)
 
HilbertVis              Hilbert curve visualization
 
Hmisc                  Harrell Miscellaneous
 
IRanges                Infrastructure for manipulating intervals on
 
                        sequences
 
iterators              Iterator construct for R
 
itertools              Iterator Tools
 
KEGG.db                A set of annotation maps for KEGG
 
KernSmooth              Functions for kernel smoothing for Wand & Jones
 
                        (1995)
 
labeling                Axis Labeling
 
laser                  Likelihood Analysis of Speciation/Extinction
 
                        Rates from Phylogenies
 
lattice                Lattice Graphics
 
leaps                  regression subset selection
 
limma                  Linear Models for Microarray Data
 
locfit                  Local Regression, Likelihood and Density
 
                        Estimation.
 
maanova                Tools for analyzing Micro Array experiments
 
marray                  Exploratory analysis for two-color spotted
 
                        microarray data
 
MASS                    Support Functions and Datasets for Venables and
 
                        Ripley's MASS
 
Matrix                  Sparse and Dense Matrix Classes and Methods
 
memoise                Memoise functions
 
methods                Formal Methods and Classes
 
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
 
                        smoothness estimation
 
msm                    Multi-state Markov and hidden Markov models in
 
                        continuous time
 
multicore              Parallel processing of R code on machines with
 
                        multiple cores or CPUs
 
multtest                Resampling-based multiple hypothesis testing
 
munsell                Munsell colour system
 
mvtnorm                Multivariate Normal and t Distributions
 
nlme                    Linear and Nonlinear Mixed Effects Models
 
nnet                    Feed-forward Neural Networks and Multinomial
 
                        Log-Linear Models
 
org.Hs.eg.db            Genome wide annotation for Human
 
ouch                    Ornstein-Uhlenbeck models for phylogenetic
 
                        comparative hypotheses
 
parallel                Support for Parallel computation in R
 
permute                Functions for generating restricted
 
                        permutations of data
 
pheatmap                Pretty Heatmaps
 
phylobase              Base package for phylogenetic structures and
 
                        comparative data
 
picante                R tools for integrating phylogenies and ecology
 
plyr                    Tools for splitting, applying and combining
 
                        data
 
preprocessCore          A collection of pre-processing functions
 
prettyR                Pretty descriptive stats.
 
proto                  Prototype object-based programming
 
qvalue                  Q-value estimation for false discovery rate
 
                        control
 
R2admb                  ADMB to R interface functions
 
RColorBrewer            ColorBrewer palettes
 
Rcpp                    Seamless R and C++ Integration
 
reshape                Flexibly reshape data.
 
reshape2                Flexibly reshape data: a reboot of the reshape
 
                        package.
 
rpart                  Recursive Partitioning
 
RSQLite                SQLite interface for R
 
Rwave                  Time-Frequency analysis of 1-D signals
 
scales                  Scale functions for graphics.
 
simpleaffy              Very simple high level analysis of Affymetrix
 
                        data
 
snow                    Simple Network of Workstations
 
spatial                Functions for Kriging and Point Pattern
 
                        Analysis
 
splines                Regression Spline Functions and Classes
 
statmod                Statistical Modeling
 
stats                  The R Stats Package
 
stats4                  Statistical Functions using S4 Classes
 
stringr                Make it easier to work with strings.
 
subplex                Subplex optimization algorithm
 
survival                Survival analysis, including penalised
 
                        likelihood.
 
tcltk                  Tcl/Tk Interface
 
tools                  Tools for Package Development
 
utils                  The R Utils Package
 
vegan                  Community Ecology Package
 
vsn                    Variance stabilization and calibration for
 
                        microarray data
 
waveslim                Basic wavelet routines for one-, two- and
 
                        three-dimensional signal processing
 
wavethresh              Wavelets statistics and transforms.
 
XML                    Tools for parsing and generating XML within R
 
                        and S-Plus.
 
xtable                  Export tables to LaTeX or HTML
 
zlibbioc                An R packaged zlib-1.2.5
 
</pre>
 
{{#if: {{#var: exe}}|==How To Run==
 
WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
 
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
 
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
 
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
{{#if: {{#var: pbs}}|==PBS Script Examples==
+
{{#if: {{#var: job}}|==Job Script Examples==
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
{{#if: {{#var: policy}}|==Usage policy==
+
''Expand this section to view example R script.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<source lang=bash>
 +
#!/bin/bash
 +
#SBATCH --job-name=R_test  #Job name
 +
#SBATCH --mail-type=END,FAIL  # Mail events (NONE, BEGIN, END, FAIL, ALL)
 +
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE  # Where to send mail
 +
#SBATCH --ntasks=1
 +
#SBATCH --mem=1gb  # Per processor memory
 +
#SBATCH --time=00:05:00  # Walltime
 +
#SBATCH --output=r_job.%j.out  # Name output file
 +
#Record the time and compute node the job ran on
 +
date; hostname; pwd
 +
#Use modules to load the environment for R
 +
module load R
 +
 
 +
#Run R script  
 +
Rscript myRscript.R
 +
 
 +
date
 +
</source></div></div>
 +
|}}
 +
{{#if: {{#var: policy}}|==Usage Policy==
 
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 
{{#if: {{#var: testing}}|==Performance==
 
{{#if: {{#var: testing}}|==Performance==
WRITE PERFORMANCE TESTING RESULTS HERE|}}
+
We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the [[R Benchmark 2.5]] table |}}
{{#if: {{#var: faq}}|==FAQ==
 
*'''Q:''' **'''A:'''|}}
 
 
{{#if: {{#var: citation}}|==Citation==
 
{{#if: {{#var: citation}}|==Citation==
 
If you publish research that uses {{{app}}} you have to cite it as follows:
 
If you publish research that uses {{{app}}} you have to cite it as follows:
 
WRITE CITATION HERE
 
WRITE CITATION HERE
 
|}}
 
|}}
 +
==Rmpi Example==
 +
See [[R MPI Example]] page for an example of using Rmpi code.
  
Example of using the parallel module to run MPI jobs under R 2.14.1+
+
==Installed Libraries==
 +
You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our [[Applications FAQ]] and see the section "How do I install R packages?".
  
{{#fileAnchor: rmpi_test.R}}
+
Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:
Download raw source of the [{{#fileLink: rmpi_test.R}} rmpi_test.R] file.
+
mkdir ~/R/x86_64-pc-linux-gnu-library/4.3
<source lang=bash>
+
 
# Load the R MPI package if it is not already loaded.
+
You can set a custom library path with the R_LIBS_USER environment variable.
if (!is.loaded("mpi_initialize")) {
+
From [https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html]:
    library("Rmpi")
 
    }
 
                                                                               
 
# Spawn as many slaves as possible
 
mpi.spawn.Rslaves()
 
                                                                               
 
# In case R exits unexpectedly, have it automatically clean up
 
# resources taken up by Rmpi (slaves, memory, etc...)
 
.Last <- function(){
 
    if (is.loaded("mpi_initialize")){
 
        if (mpi.comm.size(1) > 0){
 
            print("Please use mpi.close.Rslaves() to close slaves.")
 
            mpi.close.Rslaves()
 
        }
 
        print("Please use mpi.quit() to quit R")
 
        .Call("mpi_finalize")
 
    }
 
}
 
  
# Tell all slaves to return a message identifying themselves
+
"R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers."
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
 
  
# Tell all slaves to close down, and exit the program
+
To see a list of installed libraries in the currently loaded version of R:
mpi.close.Rslaves()
+
<pre>
mpi.quit()
+
$ R
</source>
+
> installed.packages()
 +
</pre>
 +
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
 +
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.rc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
 +
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand this section to view installed library list.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
{{:R_libraries}}
 +
</div>
 +
</div>

Latest revision as of 14:41, 20 September 2024

Description

R website  

R is a free software environment for statistical computing and graphics.

Note: File a support ticket to request installation of additional libraries.

Environment Modules

Run module spider R to find out what environment modules are available for this application.

System Variables

  • HPC_R_DIR - installation directory
  • HPC_R_BIN - executable directory
  • HPC_R_LIB - library directory
  • HPC_R_INCLUDE - includes directory

How To Run

R can be run on the command-line (or the batch system) using the 'Rscript myscript.R' or 'R CMD BATCH myscript.R' command. For script development or visualization RStudio GUI application can be used. See the respective documentation for details. Alternatively an instance of RStudio Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.

Notes and Warnings
  • The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like
numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE"))

to find out the number of CPU cores 'X' requested in your job script by:

#SBATCH --cpus-per-task=X
  • Default RData format

In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.

  • Java

rJava users need to load the java module manually with 'module load java/1.7.0_79'

  • TMPDIR

If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

mkdir -p tmp
export TMPDIR=$(pwd)/tmp

in your job script to prevent this and launch your job from the respective directory and not from your home directory.

For users of PHI and FERPA: It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in /blue when working with R. Writing files to /home or $TMPDIR could expose restricted data to unauthorized users.
  • Tasks vs Cores for parallel runs

Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8

See the single-threaded and multi-threaded examples on the Sample SLURM Scripts page for more details.

Job Script Examples

Expand this section to view example R script.

#!/bin/bash
#SBATCH --job-name=R_test   #Job name	
#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE   # Where to send mail	
#SBATCH --ntasks=1
#SBATCH --mem=1gb   # Per processor memory
#SBATCH --time=00:05:00   # Walltime
#SBATCH --output=r_job.%j.out   # Name output file 
#Record the time and compute node the job ran on
date; hostname; pwd
#Use modules to load the environment for R
module load R

#Run R script 
Rscript myRscript.R

date

Performance

We have benchmarked our most recent installed R version (3.0.2) built with the included blas/lapack libraries versus the newest (as of April 2015) release 3.2.0 built with Intel MKL libraries on the HiPerGator1 hardware (AMD Abu Dhabi 2.4GHz CPUs) and the Intel Haswell 2.3GHz CPUs we're testing for possible usage in HiPerGator2. The results are presented in the R Benchmark 2.5 table

Rmpi Example

See R MPI Example page for an example of using Rmpi code.

Installed Libraries

You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our Applications FAQ and see the section "How do I install R packages?".

Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:

mkdir ~/R/x86_64-pc-linux-gnu-library/4.3

You can set a custom library path with the R_LIBS_USER environment variable. From https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html:

"R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers."

To see a list of installed libraries in the currently loaded version of R:

$ R
> installed.packages()

Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.

Expand this section to view installed library list.

File R_PACKAGES is missing.

Name Description