Difference between revisions of "R"

From UFRC
Jump to navigation Jump to search
Line 48: Line 48:
 
To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:
 
To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:
 
  module load intel/11.1 openmpi/1.4.3 R
 
  module load intel/11.1 openmpi/1.4.3 R
==Installed Packages==
+
==Installed Libraries==
 
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
 
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
<pre>
+
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.hpc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
affy                    Methods for Affymetrix Oligonucleotide Arrays
+
{{:R_libraries}}
affydata                Affymetrix Data for Demonstration Purpose
 
affyio                  Tools for parsing Affymetrix data files
 
affyPLM                Methods for fitting probe-level models
 
affyQCReport            QC Report Generation for affyBatch objects
 
akima                  Interpolation of irregularly spaced data
 
annaffy                Annotation tools for Affymetrix biological
 
                        metadata
 
annotate                Annotation for microarrays
 
AnnotationDbi          Annotation Database Interface
 
ape                    Analyses of Phylogenetics and Evolution
 
base                    The R Base Package
 
baySeq                  Empirical Bayesian analysis of patterns of
 
                        differential expression in count data
 
Biobase                Biobase: Base functions for Bioconductor
 
BiocGenerics            Generic functions for Bioconductor
 
BiocInstaller          Install/Update Bioconductor and CRAN Packages
 
Biostrings              String objects representing biological
 
                        sequences, and matching algorithms
 
bitops                  Functions for Bitwise operations
 
boot                    Bootstrap Functions (originally by Angelo Canty
 
                        for S)
 
caTools                Tools: moving window statistics, GIF, Base64,
 
                        ROC AUC, etc.
 
class                  Functions for Classification
 
cluster                Cluster Analysis Extended Rousseeuw et al.
 
CNVtools                A package to test genetic association with CNV
 
                        data
 
codetools              Code Analysis Tools for R
 
colorspace              Color Space Manipulation
 
compiler                The R Compiler Package
 
datasets                The R Datasets Package
 
DBI                    R Database Interface
 
DESeq                  Differential gene expression analysis based on
 
                        the negative binomial distribution
 
dichromat              Color schemes for dichromats
 
digest                  Create cryptographic hash digests of R objects
 
doMC                    Foreach parallel adaptor for the multicore
 
                        package
 
doSNOW                  Foreach parallel adaptor for the snow package
 
DynDoc                  Dynamic document tools
 
edgeR                  Empirical analysis of digital gene expression
 
                        data in R
 
foreach                Foreach looping construct for R
 
foreign                Read Data Stored by Minitab, S, SAS, SPSS,
 
                        Stata, Systat, dBase, ...
 
gcrma                  Background Adjustment Using Sequence
 
                        Information
 
gdata                  Various R programming tools for data
 
                        manipulation
 
gee                    Generalized Estimation Equation solver
 
geiger                  Analysis of evolutionary diversification
 
genefilter              genefilter: methods for filtering genes from
 
                        microarray experiments
 
geneplotter            Graphics related functions for Bioconductor
 
GenomicRanges          Representation and manipulation of genomic
 
                        intervals
 
ggplot2                An implementation of the Grammar of Graphics
 
glmmADMB                Generalized Linear Mixed Models using AD Model
 
                        Builder
 
GO.db                  A set of annotation maps describing the entire
 
                        Gene Ontology
 
gplots                  Various R programming tools for plotting data
 
graphics                The R Graphics Package
 
grDevices              The R Graphics Devices and Support for Colours
 
                        and Fonts
 
grid                    The Grid Graphics Package
 
gtools                  Various R programming tools
 
hacks                  Convenient R Functions
 
hgu95av2.db            Affymetrix Human Genome U95 Set annotation data
 
                        (chip hgu95av2)
 
HilbertVis              Hilbert curve visualization
 
Hmisc                  Harrell Miscellaneous
 
IRanges                Infrastructure for manipulating intervals on
 
                        sequences
 
iterators              Iterator construct for R
 
itertools              Iterator Tools
 
KEGG.db                A set of annotation maps for KEGG
 
KernSmooth              Functions for kernel smoothing for Wand & Jones
 
                        (1995)
 
labeling                Axis Labeling
 
laser                  Likelihood Analysis of Speciation/Extinction
 
                        Rates from Phylogenies
 
lattice                Lattice Graphics
 
leaps                  regression subset selection
 
limma                  Linear Models for Microarray Data
 
locfit                  Local Regression, Likelihood and Density
 
                        Estimation.
 
maanova                Tools for analyzing Micro Array experiments
 
marray                  Exploratory analysis for two-color spotted
 
                        microarray data
 
MASS                    Support Functions and Datasets for Venables and
 
                        Ripley's MASS
 
Matrix                  Sparse and Dense Matrix Classes and Methods
 
memoise                Memoise functions
 
methods                Formal Methods and Classes
 
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
 
                        smoothness estimation
 
msm                    Multi-state Markov and hidden Markov models in
 
                        continuous time
 
multicore              Parallel processing of R code on machines with
 
                        multiple cores or CPUs
 
multtest                Resampling-based multiple hypothesis testing
 
munsell                Munsell colour system
 
mvtnorm                Multivariate Normal and t Distributions
 
nlme                    Linear and Nonlinear Mixed Effects Models
 
nnet                    Feed-forward Neural Networks and Multinomial
 
                        Log-Linear Models
 
org.Hs.eg.db            Genome wide annotation for Human
 
ouch                    Ornstein-Uhlenbeck models for phylogenetic
 
                        comparative hypotheses
 
parallel                Support for Parallel computation in R
 
permute                Functions for generating restricted
 
                        permutations of data
 
pheatmap                Pretty Heatmaps
 
phylobase              Base package for phylogenetic structures and
 
                        comparative data
 
picante                R tools for integrating phylogenies and ecology
 
plyr                    Tools for splitting, applying and combining
 
                        data
 
preprocessCore          A collection of pre-processing functions
 
prettyR                Pretty descriptive stats.
 
proto                  Prototype object-based programming
 
qvalue                  Q-value estimation for false discovery rate
 
                        control
 
R2admb                  ADMB to R interface functions
 
RColorBrewer            ColorBrewer palettes
 
Rcpp                    Seamless R and C++ Integration
 
reshape                Flexibly reshape data.
 
reshape2                Flexibly reshape data: a reboot of the reshape
 
                        package.
 
rpart                  Recursive Partitioning
 
RSQLite                SQLite interface for R
 
Rwave                  Time-Frequency analysis of 1-D signals
 
scales                  Scale functions for graphics.
 
simpleaffy              Very simple high level analysis of Affymetrix
 
                        data
 
snow                    Simple Network of Workstations
 
spatial                Functions for Kriging and Point Pattern
 
                        Analysis
 
splines                Regression Spline Functions and Classes
 
statmod                Statistical Modeling
 
stats                  The R Stats Package
 
stats4                  Statistical Functions using S4 Classes
 
stringr                Make it easier to work with strings.
 
subplex                Subplex optimization algorithm
 
survival                Survival analysis, including penalised
 
                        likelihood.
 
tcltk                  Tcl/Tk Interface
 
tools                  Tools for Package Development
 
utils                  The R Utils Package
 
vegan                  Community Ecology Package
 
vsn                    Variance stabilization and calibration for
 
                        microarray data
 
waveslim                Basic wavelet routines for one-, two- and
 
                        three-dimensional signal processing
 
wavethresh              Wavelets statistics and transforms.
 
XML                    Tools for parsing and generating XML within R
 
                        and S-Plus.
 
xtable                  Export tables to LaTeX or HTML
 
zlibbioc                An R packaged zlib-1.2.5
 
</pre>
 
 
{{#if: {{#var: exe}}|==How To Run==
 
{{#if: {{#var: exe}}|==How To Run==
 
WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
 
WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}

Revision as of 18:49, 13 June 2012

Description

{{{name}}} website  
R is a free software environment for statistical computing and graphics. Template:App Location

Available versions

Note: File a support ticket to request installation of additional libraries.

  • 2.14.1-mpi - R base package MPI-enabled via the Rmpi library.
  • 2.14.2
  • 2.15.0 (default)

Running the application using modules

To use R with the environment modules system at HPC the following commands are available:

Get module information for r:

$module spider R

Load the default application module:

$module load R

The modulefile for this software adds the directory with executable files to the shell execution PATH and sets the following environment variables:

  • HPC_R_DIR - directory where R is located.
  • HPC_R_BIN - executable directory
  • HPC_R_LIB - library directory
  • HPC_R_INCLUDE - includes directory

To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:

module load intel/11.1 openmpi/1.4.3 R

Installed Libraries

Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. File R_PACKAGES is missing.

Name Description




FAQ

  • Q: When I submit the job with N=1 and M=1 it runs and R allocates the 10 slaves that I want. Is this the OK?
    • A: In short, no. This is bad since you are lying to the scheduler about the resources you intend to run. We have scripts that will kill your job if they catch it and we tend to suspend accounts of users who make a practice of it. :)
  • Q: The actual job I want to run is much larger. Anywhere from 31 to 93 processors are desired. Is it ok to request this many processors.
    • A: That depends on the level of investment from your PI. If you ask for processors than your groups core allocation, which depends on the investment level, you will be essentially borrowing cores from other groups and may wait an extended period of time in the queue before your job runs. Groups are allowed to run on up to 10x their core allocation provided the resources are available. If you ask for more than 10x your groups core allocation, the job will be blocked indefinitely.
  • Q: Do I need the number of nodes requested to be correct or can I just have R go grab slaves after the job is submitted with N=1 and M=1?
    • A: Your resource request must be consistent with what you actually intend to use as noted above.
  • Q: Is it better to request a large number of nodes for a shorter period of time or less nodes for longer period of time (concretely, say 8 nodes for 40 hours versus 16 nodes for 20 hours) in terms of getting through the queue?
    • A: Do not confuse "nodes" with "cores/processors". Each "node" is a physical machine with between 4 and 48 cores. Your MPI threads will run on "cores" which may all be in the same "node" or spread among multiple nodes. You should ask for the number of cores you need and spread them among as few nodes as possible unless you have a good reason to do otherwise. Thus you should generally ask for things like
           #PBS -l nodes=1:ppn=8    (we have lots of 8p nodes)
           #PBS -l nodes=1:ppn=12  (we have a number of 12p also)
Multiples of the above work as well so you might ask for nodes=3:ppn=8 if you want to run 24 threads on 24 different cores.
It looks like in the R model there is a master/slave paradigm so you really need one master thread to manage the "slave" threads. It is likely that the master thread accumulates little CPU time so you could neglect it. In other words tell the scheduler that you want nodes=3:ppn=8 and tell R to spawn 24 children.
This is a white lie which will do little harm. However, if it turns out that the master accumulates significant CPU time and your job gets killed by our rogue process killer, you can ask for the resources as follows
#PBS -l nodes=1:ppn=1infiniband+3:ppn=8:infiniband 
This will allocate 1 thread on a separate node (the master thread) and then the slave threads will be allocated on 3 additional nodes with at least 8 cores each.

Rmpi Example

Example of using the parallel module to run MPI jobs under R 2.14.1+

{{#fileAnchor: rmpi_test.R}} Download raw source of the [{{#fileLink: rmpi_test.R}} rmpi_test.R] file.

# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
    }
                                                                                
# Spawn as many slaves as possible
mpi.spawn.Rslaves()
                                                                                
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
    if (is.loaded("mpi_initialize")){
        if (mpi.comm.size(1) > 0){
            print("Please use mpi.close.Rslaves() to close slaves.")
            mpi.close.Rslaves()
        }
        print("Please use mpi.quit() to quit R")
        .Call("mpi_finalize")
    }
}

# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()