R: Difference between revisions

From UFRC
Jump to navigation Jump to search
No edit summary
No edit summary
Line 48: Line 48:
To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:
To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:
  module load intel/11.1 openmpi/1.4.3 R
  module load intel/11.1 openmpi/1.4.3 R
==Installed Packages==
==Installed Libraries==
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
'''Note: ''' Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version.
<pre>
<!-- Note to HPC Staff: paste the list generated by the "library()" command between the <pre> </pre> tags in the http://wiki.hpc.ufl.edu/index.php/R_libraries wiki page for the inclusion below to work. -->
affy                    Methods for Affymetrix Oligonucleotide Arrays
{{:R_libraries}}
affydata                Affymetrix Data for Demonstration Purpose
affyio                  Tools for parsing Affymetrix data files
affyPLM                Methods for fitting probe-level models
affyQCReport            QC Report Generation for affyBatch objects
akima                  Interpolation of irregularly spaced data
annaffy                Annotation tools for Affymetrix biological
                        metadata
annotate                Annotation for microarrays
AnnotationDbi          Annotation Database Interface
ape                    Analyses of Phylogenetics and Evolution
base                    The R Base Package
baySeq                  Empirical Bayesian analysis of patterns of
                        differential expression in count data
Biobase                Biobase: Base functions for Bioconductor
BiocGenerics            Generic functions for Bioconductor
BiocInstaller          Install/Update Bioconductor and CRAN Packages
Biostrings              String objects representing biological
                        sequences, and matching algorithms
bitops                  Functions for Bitwise operations
boot                    Bootstrap Functions (originally by Angelo Canty
                        for S)
caTools                Tools: moving window statistics, GIF, Base64,
                        ROC AUC, etc.
class                  Functions for Classification
cluster                Cluster Analysis Extended Rousseeuw et al.
CNVtools                A package to test genetic association with CNV
                        data
codetools              Code Analysis Tools for R
colorspace              Color Space Manipulation
compiler                The R Compiler Package
datasets                The R Datasets Package
DBI                    R Database Interface
DESeq                  Differential gene expression analysis based on
                        the negative binomial distribution
dichromat              Color schemes for dichromats
digest                  Create cryptographic hash digests of R objects
doMC                    Foreach parallel adaptor for the multicore
                        package
doSNOW                  Foreach parallel adaptor for the snow package
DynDoc                  Dynamic document tools
edgeR                  Empirical analysis of digital gene expression
                        data in R
foreach                Foreach looping construct for R
foreign                Read Data Stored by Minitab, S, SAS, SPSS,
                        Stata, Systat, dBase, ...
gcrma                  Background Adjustment Using Sequence
                        Information
gdata                  Various R programming tools for data
                        manipulation
gee                    Generalized Estimation Equation solver
geiger                  Analysis of evolutionary diversification
genefilter              genefilter: methods for filtering genes from
                        microarray experiments
geneplotter            Graphics related functions for Bioconductor
GenomicRanges          Representation and manipulation of genomic
                        intervals
ggplot2                An implementation of the Grammar of Graphics
glmmADMB                Generalized Linear Mixed Models using AD Model
                        Builder
GO.db                  A set of annotation maps describing the entire
                        Gene Ontology
gplots                  Various R programming tools for plotting data
graphics                The R Graphics Package
grDevices              The R Graphics Devices and Support for Colours
                        and Fonts
grid                    The Grid Graphics Package
gtools                  Various R programming tools
hacks                  Convenient R Functions
hgu95av2.db            Affymetrix Human Genome U95 Set annotation data
                        (chip hgu95av2)
HilbertVis              Hilbert curve visualization
Hmisc                  Harrell Miscellaneous
IRanges                Infrastructure for manipulating intervals on
                        sequences
iterators              Iterator construct for R
itertools              Iterator Tools
KEGG.db                A set of annotation maps for KEGG
KernSmooth              Functions for kernel smoothing for Wand & Jones
                        (1995)
labeling                Axis Labeling
laser                  Likelihood Analysis of Speciation/Extinction
                        Rates from Phylogenies
lattice                Lattice Graphics
leaps                  regression subset selection
limma                  Linear Models for Microarray Data
locfit                  Local Regression, Likelihood and Density
                        Estimation.
maanova                Tools for analyzing Micro Array experiments
marray                  Exploratory analysis for two-color spotted
                        microarray data
MASS                    Support Functions and Datasets for Venables and
                        Ripley's MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
memoise                Memoise functions
methods                Formal Methods and Classes
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
                        smoothness estimation
msm                    Multi-state Markov and hidden Markov models in
                        continuous time
multicore              Parallel processing of R code on machines with
                        multiple cores or CPUs
multtest                Resampling-based multiple hypothesis testing
munsell                Munsell colour system
mvtnorm                Multivariate Normal and t Distributions
nlme                    Linear and Nonlinear Mixed Effects Models
nnet                    Feed-forward Neural Networks and Multinomial
                        Log-Linear Models
org.Hs.eg.db            Genome wide annotation for Human
ouch                    Ornstein-Uhlenbeck models for phylogenetic
                        comparative hypotheses
parallel                Support for Parallel computation in R
permute                Functions for generating restricted
                        permutations of data
pheatmap                Pretty Heatmaps
phylobase              Base package for phylogenetic structures and
                        comparative data
picante                R tools for integrating phylogenies and ecology
plyr                    Tools for splitting, applying and combining
                        data
preprocessCore          A collection of pre-processing functions
prettyR                Pretty descriptive stats.
proto                  Prototype object-based programming
qvalue                  Q-value estimation for false discovery rate
                        control
R2admb                  ADMB to R interface functions
RColorBrewer            ColorBrewer palettes
Rcpp                    Seamless R and C++ Integration
reshape                Flexibly reshape data.
reshape2                Flexibly reshape data: a reboot of the reshape
                        package.
rpart                  Recursive Partitioning
RSQLite                SQLite interface for R
Rwave                  Time-Frequency analysis of 1-D signals
scales                  Scale functions for graphics.
simpleaffy              Very simple high level analysis of Affymetrix
                        data
snow                    Simple Network of Workstations
spatial                Functions for Kriging and Point Pattern
                        Analysis
splines                Regression Spline Functions and Classes
statmod                Statistical Modeling
stats                  The R Stats Package
stats4                  Statistical Functions using S4 Classes
stringr                Make it easier to work with strings.
subplex                Subplex optimization algorithm
survival                Survival analysis, including penalised
                        likelihood.
tcltk                  Tcl/Tk Interface
tools                  Tools for Package Development
utils                  The R Utils Package
vegan                  Community Ecology Package
vsn                    Variance stabilization and calibration for
                        microarray data
waveslim                Basic wavelet routines for one-, two- and
                        three-dimensional signal processing
wavethresh              Wavelets statistics and transforms.
XML                    Tools for parsing and generating XML within R
                        and S-Plus.
xtable                  Export tables to LaTeX or HTML
zlibbioc                An R packaged zlib-1.2.5
</pre>
{{#if: {{#var: exe}}|==How To Run==
{{#if: {{#var: exe}}|==How To Run==
WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}

Revision as of 18:49, 13 June 2012

Description

{{{name}}} website  
R is a free software environment for statistical computing and graphics. Template:App Location

Available versions

Note: File a support ticket to request installation of additional libraries.

  • 2.14.1-mpi - R base package MPI-enabled via the Rmpi library.
  • 2.14.2
  • 2.15.0 (default)

Running the application using modules

To use R with the environment modules system at HPC the following commands are available:

Get module information for r:

$module spider R

Load the default application module:

$module load R

The modulefile for this software adds the directory with executable files to the shell execution PATH and sets the following environment variables:

  • HPC_R_DIR - directory where R is located.
  • HPC_R_BIN - executable directory
  • HPC_R_LIB - library directory
  • HPC_R_INCLUDE - includes directory

To use the version of R built for parallel execution with MPI via the Rmpi library load the following modules:

module load intel/11.1 openmpi/1.4.3 R

Installed Libraries

Note: Many of the packages in the R library shown below are installed as a part of Bioconductor meta-library. The list is generated from the default R version. File R_PACKAGES is missing.

Name Description




FAQ

  • Q: When I submit the job with N=1 and M=1 it runs and R allocates the 10 slaves that I want. Is this the OK?
    • A: In short, no. This is bad since you are lying to the scheduler about the resources you intend to run. We have scripts that will kill your job if they catch it and we tend to suspend accounts of users who make a practice of it. :)
  • Q: The actual job I want to run is much larger. Anywhere from 31 to 93 processors are desired. Is it ok to request this many processors.
    • A: That depends on the level of investment from your PI. If you ask for processors than your groups core allocation, which depends on the investment level, you will be essentially borrowing cores from other groups and may wait an extended period of time in the queue before your job runs. Groups are allowed to run on up to 10x their core allocation provided the resources are available. If you ask for more than 10x your groups core allocation, the job will be blocked indefinitely.
  • Q: Do I need the number of nodes requested to be correct or can I just have R go grab slaves after the job is submitted with N=1 and M=1?
    • A: Your resource request must be consistent with what you actually intend to use as noted above.
  • Q: Is it better to request a large number of nodes for a shorter period of time or less nodes for longer period of time (concretely, say 8 nodes for 40 hours versus 16 nodes for 20 hours) in terms of getting through the queue?
    • A: Do not confuse "nodes" with "cores/processors". Each "node" is a physical machine with between 4 and 48 cores. Your MPI threads will run on "cores" which may all be in the same "node" or spread among multiple nodes. You should ask for the number of cores you need and spread them among as few nodes as possible unless you have a good reason to do otherwise. Thus you should generally ask for things like
           #PBS -l nodes=1:ppn=8    (we have lots of 8p nodes)
           #PBS -l nodes=1:ppn=12  (we have a number of 12p also)
Multiples of the above work as well so you might ask for nodes=3:ppn=8 if you want to run 24 threads on 24 different cores.
It looks like in the R model there is a master/slave paradigm so you really need one master thread to manage the "slave" threads. It is likely that the master thread accumulates little CPU time so you could neglect it. In other words tell the scheduler that you want nodes=3:ppn=8 and tell R to spawn 24 children.
This is a white lie which will do little harm. However, if it turns out that the master accumulates significant CPU time and your job gets killed by our rogue process killer, you can ask for the resources as follows
#PBS -l nodes=1:ppn=1infiniband+3:ppn=8:infiniband 
This will allocate 1 thread on a separate node (the master thread) and then the slave threads will be allocated on 3 additional nodes with at least 8 cores each.

Rmpi Example

Example of using the parallel module to run MPI jobs under R 2.14.1+

{{#fileAnchor: rmpi_test.R}} Download raw source of the [{{#fileLink: rmpi_test.R}} rmpi_test.R] file.

# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
    }
                                                                                
# Spawn as many slaves as possible
mpi.spawn.Rslaves()
                                                                                
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
    if (is.loaded("mpi_initialize")){
        if (mpi.comm.size(1) > 0){
            print("Please use mpi.close.Rslaves() to close slaves.")
            mpi.close.Rslaves()
        }
        print("Please use mpi.quit() to quit R")
        .Call("mpi_finalize")
    }
}

# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()