Difference between revisions of "RAxML"

Latest revision as of 21:31, 21 August 2022

Description

RAxML (Randomized Axelerated Maximum Likelihood) written by Alexandros Stamatakis and others is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It was originally derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package.

Environment Modules

Run module spider raxml to find out what environment modules are available for this application.

System Variables

HPC_RAXML_DIR - installation directory

How To Run

Please see the discussion below on performance characteristics of the different implementations of RAxML. In general, there are four different RAxML executables installed on the Research Computing systems.

SIMD Extensions

SSE3 and AVX are extensions to the x86 instruction set that accelerate single-instruction, multiple-data (SIMD) computations - often referred to as vectorization. For each of the four RAxML executables, we have both an SSE3 and an AVX version compiled. Although HiPerGator compute servers support AVX instructions, our testing indicates that the SSE3 executable is still faster - probably due to memory bandwidth constraints. Users are encouraged to experiment with their particular dataset but otherwise should default to the SSE3 executable. To run the AVX version of RAxML, replace "SSE3" in the executable name with "AVX".

Single-Threaded (Serial)

The serial version of RAxML is called raxmlHPC-SSE3, and is a single-threaded application.

Multi-Threaded (Parallel)

The multi-threaded version of RAxML is called raxmlHPC-PTHREADS-SSE3, and can use multiple processors on a single compute sever (or node). This version implements the fine-grained parallelism as discussed below. Resource requests for this version should always be in the form of nodes=1:ppn=x, where x is the number of processors to use. Please see the information below when selecting the number of processors to use. In our testing, values over 8 do not significantly speed up analyses and should be avoided. It is important to use the -T flag which tells RAxML how many processors to use. You can either put the same number used in the resource request, or use the PBS environment variable $PBS_NUM_PPN which is set for you by the scheduler--e.g. -T $PBS_NUM_PPN.

MPI (Parallel)

The distributed memory version of RAxML utilizes the MPI API. The executable is called raxmlHPC-MPI-SSE3 and can use multiple processors that may, or may not be, on the same compute server (node). This version implements the course-grained parallelism as discussed below. Resource requests for this version should generally be in the form of nodes=1:ppn=x, where x is the number of processors to use, as long as x is less than 32. If you want to use more than 32 processors, you should generally ask for more nodes.

Hybrid (Parallel)

The hybrid (MPI and multi-threading) executable of RAxML is called raxmlHPC-HYBRID-SSE3 and uses multiple processors on multiple compute servers. It implements both course-grained and fine-grained parallelism as discussed below. Resource requests for this version should be in the form of nodes=x:ppn=y. As with the MPI executable, if the total number of processors desired is 32 or less, the resource request should be nodes=1:ppn=y and you should "mpiexec -np <number of course-grained processes>" and "-T <number of fine-grained threads>" such that the product of the two equals y.

For example to run 5 course-grained processes, each of which using 4 fine-grained threads, the following resource request and command line is suggested.

 #PBS -l nodes=1:ppn=20
 ...
 mpiexec -bynode -np 5 raxmlHPC-HYBRID-SSE3 -T 4 ...

If you require more than 32 cores total, it is best to use multiple nodes. In this case, the number of nodes and processors per node should correspond to the number of course-grained and fine-grained threads requested. For example,

 #PBS -l nodes=10:ppn=4
 ...
 mpiexec -bynode -np 10 raxmlHPC-HYBRID-SSE3 -T 4 ...

Performance

We highly recommend that users read the paper by Pfeiffer and Stamatakis (2010) before running parallel versions of RAxML. This paper provides a good overview of the different types of parallelism implemented in RAxML and how to best leverage them for analyses. The discussion below is largely based on this paper.

Parallelism in RAxML

RAxML implements two different types of parallelism, referred to as course-grained and fine-grained. Course-grained parallelism is able to be split across multiple compute servers. Each course-grained process can work on one tree optimization. This may be a bootstrap replicate or a ML search. Fine-grained parallelism allows multiple processors on the SAME server to split up a singe tree optimization. A single optimization cannot be split across servers.

If the user is running the -f a option (bootstrap search and ML search in one analysis) using the MPI or Hybrid executabls, the bootstrap replicates are split among the MPI processes, and once those are complete, each MPI process does an independent ML search. This is slightly different than under other methods as multiple ML searches are being performed. While this is likely a good thing in terms of finding the ML tree and a thorough analysis, users should understand that this stage will not see a reduction in run time because each MPI task is doing an independent search, rather than working together on a single search.

@@ Line 1: / Line 1: @@
 __NOTOC__
 __NOEDITSECTION__
-[[Category:Software]][[Category:Bioinformatics]]
+[[Category:Software]][[Category:Biology]][[Category:Phylogenetics]]
-<!-- ########  Template Configuration ######## -->
+{|<!--Main settings - REQUIRED-->
-<!--Edit definitions of the variables used in template calls
-Required variables:
-app - lowercase name of the application e.g. "amber"
-url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
-Optional variables:
-INTEL - Version of the Intel Compiler e.g. "11.1"
-MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
--->
-{|
-<!--Main settings - REQUIRED-->
 |{{#vardefine:app|raxml}}
 |{{#vardefine:url|http://wwwkramer.in.tum.de/exelixis/software.html}}
-<!--Compiler and MPI settings - OPTIONAL -->
+|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
-|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
-|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
-<!--Choose sections to enable - OPTIONAL-->
-|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
-|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
-|{{#vardefine:pbs|1}} <!--Enable PBS script wiki page link-->
+|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
 |{{#vardefine:policy|}} <!--Enable policy section -->
-|{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
+|{{#vardefine:testing|1}} <!--Enable performance testing/profiling section -->
 |{{#vardefine:faq|}} <!--Enable FAQ section -->
 |{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
@@ Line 33: / Line 18: @@
 {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
-RAxML (Randomized Axelerated Maximum Likelihood) written by Alexandros Stamatakis and others is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package.
+RAxML (Randomized Axelerated Maximum Likelihood) written by Alexandros Stamatakis and others is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It was originally derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP package.
-<!--Location-->
+<!--Modules-->
-{{App_Location|app={{#var:app}}|{{#var:ver}}}}
+==Environment Modules==
-==Installed Versions==
+Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
-* 1.0.5-Light
+==System Variables==
-* 7.3.0 (Standard)
+* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
-* 7.3.0.0220 - 2012-02-20 upstream code update.
+{{#if: {{#var: exe}}|==How To Run==
-* 7.3.0.0307 - 2012-03-07 upstream code update.
+Please see the discussion below on performance characteristics of the different implementations of RAxML. In general, there are four different RAxML executables installed on the Research Computing systems.
-* 7.3.2.0705 - 2012-07-05 upstream code update (default).
+===SIMD Extensions===
+'''SSE3''' and '''AVX''' are extensions to the x86 instruction set that accelerate single-instruction, multiple-data (SIMD) computations - often referred to as vectorization. For each of the four RAxML executables, we have both an SSE3 and an AVX version compiled. Although HiPerGator compute servers support AVX instructions, our testing indicates that the SSE3 executable is still faster - probably due to memory bandwidth constraints. Users are encouraged to experiment with their particular dataset but otherwise should default to the SSE3 executable. To run the AVX version of RAxML, replace "SSE3" in the executable name with "AVX".
+===Single-Threaded (Serial)===
+The serial version of RAxML is called raxmlHPC-SSE3, and is a single-threaded application.
+===Multi-Threaded (Parallel)===
+The multi-threaded version of RAxML is called raxmlHPC-PTHREADS-SSE3, and can use multiple processors on a single compute sever (or node). This version implements the fine-grained parallelism as discussed below.  Resource requests for this version should always be in the form of '''nodes=1:ppn=x''', where x is the number of processors to use. Please see the information below when selecting the number of processors to use. In our testing, values over 8 do not significantly speed up analyses and should be avoided. It is important to use the '''-T''' flag which tells RAxML how many processors to use. You can either put the same number used in the resource request, or use the PBS environment variable '''$PBS_NUM_PPN''' which is set for you by the scheduler--e.g. -T $PBS_NUM_PPN.
+===MPI (Parallel)===
+The distributed memory version of RAxML utilizes the MPI API.  The executable is called raxmlHPC-MPI-SSE3 and can use multiple processors that may, or may not be, on the same compute server (node). This version implements the course-grained parallelism as discussed below. Resource requests for this version should generally be in the form of '''nodes=1:ppn=x''', where x is the number of processors to use, as long as x is less than 32. If you want to use more than 32 processors, you should generally ask for more nodes.
-==Variants of RAxML standard==
+===Hybrid (Parallel)===
-* OpenMP threaded version compiled with gcc- <code>/apps/RAxML/7.3.2-20120705/threaded/raxmlHPC-PTHREADS-SSE3</code>
-* OpenMP threaded version compiled with Intel 11.1- <code>/apps/RAxML/7.3.2-20120705/intel/threaded/raxmlHPC-PTHREADS-SSE3</code>
-* OpemMPI/OpenMP MPI and threaded hybrid version - <code>/apps/RAxML/7.3.2-20120705/intel/mpi/raxmlHPC-HYBRID-SSE3</code>
-* OpenMPI version - <code>/apps/RAxML/7.3.2-20120705/intel/mpi/raxmlHPC-MPI</code>
-<!-- -->
-{{#if: {{#var: mod}}|==Running the application using modules==
-To use {{#var:app}} with the environment modules system at HPC the following commands are available:
+The hybrid (MPI and multi-threading) executable of RAxML is called raxmlHPC-HYBRID-SSE3 and uses multiple processors on multiple compute servers. It implements both course-grained and fine-grained parallelism as discussed below. Resource requests for this version should be in the form of '''nodes=x:ppn=y'''. As with the MPI executable, if the total number of processors desired is 32 or less, the resource request should be '''nodes=1:ppn=y''' and you should "mpiexec -np <number of course-grained processes>" and "-T <number of fine-grained threads>" such that the product of the two equals y.
-Get module information for {{lc: {{PAGENAME}}}}:
+For example to run 5 course-grained processes, each of which using 4 fine-grained threads, the following resource request and command line is suggested.
- $module spider {{#var:app}}
-{{#if: {{#var:intel}}|Load Intel compiler: {{#tag:pre|$module load intel/{{#var:intel}}}}|}}{{#if: {{#var:mpi}}|Load MPI implementation: {{#tag:pre|$module load {{#var:mpi}}}}|}}
-Load the application module:
- $module load {{#var:app}}
-The modulefile for this software adds the directory with executable files to the shell execution PATH and sets the following environment variables:
+<pre>
+ #PBS -l nodes=1:ppn=20
+ ...
+ mpiexec -bynode -np 5 raxmlHPC-HYBRID-SSE3 -T 4 ...
+</pre>
-* HPC_{{uc:{{#var:app}}}}_DIR - directory where {{#var:app}} is located.|}}
+If you require more than 32 cores total, it is best to use multiple nodes. In this case, the number of nodes and processors per node should correspond to the number of course-grained and fine-grained threads requested. For example,
+<pre>
+ #PBS -l nodes=10:ppn=4
+ ...
+ mpiexec -bynode -np 10 raxmlHPC-HYBRID-SSE3 -T 4 ...
+</pre>
-To run the OpenMPI version load the following modules
+|}}
- module load intel/11.1 openmpi/1.4.3 raxml/7.3.2.0705
-{{#if: {{#var: exe}}|==How To Run==
-WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
 {{#if: {{#var: conf}}|==Configuration==
 See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
 {{#if: {{#var: pbs}}|==PBS Script Examples==
+See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.
-{{#fileAnchor: raxml.threaded.pbs}}
+|}}
-Download raw source of the [{{#fileLink: raxml.threaded.pbs}} raxml.threaded.pbs]
+{{#if: {{#var: policy}}|==Usage Policy==
+WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
+{{#if: {{#var: testing}}|==Performance==
+We highly recommend that users read the paper by Pfeiffer and Stamatakis ([http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5470900&tag=1 2010]) before running parallel versions of RAxML. This paper provides a good overview of the different types of parallelism implemented in RAxML and how to best leverage them for analyses. The discussion below is largely based on this paper.
-<source lang=bash>
+===Parallelism in RAxML===
-#!/bin/sh
-#PBS -N RAxML
-#PBS -m bea
-#PBS -M <YOUR E-MAIL HERE>
-#PBS -o raxml.$PBS_JOBID.out
-#PBS -e raxml.$PBS_JOBID.err
-#PBS -l nodes=1:ppn=4
-#PBS -l pmem=450mb
-#PBS -l walltime=24:00:00
-#
-# Change to the directory where you type qsub. Should be in /scratch, not your $HOME
-cd $PBS_O_WORKDIR
-# Get number of cores--set number of cores above in nodes=1:ppn=number of cores wanted
+RAxML implements two different types of parallelism, referred to as '''course-grained''' and '''fine-grained'''. Course-grained parallelism is able to be split across multiple compute servers. Each course-grained process can work on one tree optimization. This may be a bootstrap replicate or a ML search. Fine-grained parallelism allows multiple processors on the SAME server to split up a singe tree optimization. A single optimization cannot be split across servers.
-# raxmlHPC-PTHREADS-SSE3 can only use cores on a single node, so do not change nodes=1
-# By using this, you don't need to change the value for -T in the raxml command, though
-#  if you change to a single core, raxml will fail saying you need to use 2 or more.
-#  For single processor jobs use raxmlHPC-SSE3 and nodes=1:ppn=1.
-NPROCS=`wc -l < $PBS_NODEFILE`
-# Load the raxml environment
+If the user is running the '''-f a''' option (bootstrap search and ML search in one analysis) using the MPI or Hybrid executabls, the bootstrap replicates are split among the MPI processes, and once those are complete, each MPI process does an independent ML search. This is slightly different than under other methods as multiple ML searches are being performed. While this is likely a good thing in terms of finding the ML tree and a thorough analysis, users should understand that this stage will not see a reduction in run time because each MPI task is doing an independent search, rather than working together on a single search.
-module load intel/11.1 raxml
-# The raxml command, modify as needed. Read the manual or use raxmlHPC-PTHREADS-SSE3 -help to see options
-# Note the use of the variable $NPROCS defined above with the -T option, no need to change this
-raxmlHPC-PTHREADS-SSE3 -f d -m GTRCAT -s your_data.phy -n output_name -p 3112 -b 758 -N 500  -T $NPROCS
-</source>
 |}}
-{{#if: {{#var: policy}}|==Usage policy==
-WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
-{{#if: {{#var: testing}}|==Performance==
-WRITE PERFORMANCE TESTING RESULTS HERE|}}
 {{#if: {{#var: faq}}|==FAQ==
 *'''Q:''' **'''A:'''|}}