Difference between revisions of "Sate"

From UFRC
Jump to navigation Jump to search
Line 5: Line 5:
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
+
|{{#vardefine:exe|1}}            <!--ADDITIONAL INFO-->
 
|{{#vardefine:job|}}            <!--JOB SCRIPTS-->
 
|{{#vardefine:job|}}            <!--JOB SCRIPTS-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
SATé is a software package for inferring a sequence alignment and phylogenetic tree. The iterative algorithm involves repeated alignment and tree searching operations. The original data set is divided into smaller subproblems by a tree-based decomposition. These subproblems are aligned and further merged for phylogenetic tree inference.
+
SATé is a software package for inferring a sequence alignment and phylogenetic tree. The iterative algorithm involves repeated alignment and tree searching operations. The original data set is divided into smaller subproblems by a tree-based decomposition. These subproblems are aligned and further merged for phylogenetic tree inference. For more information, please refer to the recent publication of Liu et al.
  
 +
The implementation developed in University of Kansas is written by Jiaye Yu, Mark Holder, Jeet Sukumaran, and Siavash Mirarab. By default, this implementation uses the "SATe-II fast" settings. The primary difference is the use of recursive CT-1 instead of CT-5 decomposition described in the original Liu et al. paper.
 +
 +
The alignment and tree searching routines are implemented by calling "external" programs not written by us (but are bundled with the SATé distribution).
 +
 +
Currently, the following tools are supported, and are bundled with the SATe distribution:
 +
*ClustalW 2.0.12
 +
*MAFFT 6.717
 +
*MUSCLE 3.7
 +
*OPAL 1.0.3
 +
*PRANK 100311
 +
*RAxML 7.2.6
 +
*FastTree 2.1.4
 
<!--Modules-->
 
<!--Modules-->
 
==Environment Modules==
 
==Environment Modules==
Line 32: Line 44:
 
<!--Run-->
 
<!--Run-->
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
 +
'''Note:''' By default, SATe uses your home directory as the location for temporary files. It is a violation of HPC policy for jobs to write to your home directory.
 +
It is critical that you include the '''--temporaries=''' flag in your SATe command line to provide an alternative path for the temp files. PBS provides the $TMPDIR variable for you, and this is an excellent option. See example submission script. Another convenient variable you could use is $PBS_O_WORKDIR, something like --temporaries=$PBS_O_WORKDIR/temp would work well too.
  
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
+
For all command line options, run:
 
+
run_sate.py -h
 
|}}
 
|}}
 
<!--Job Scripts-->
 
<!--Job Scripts-->
Line 69: Line 83:
 
* Sukumaran, J. and Mark T. Holder. 2010. "DendroPy: A Python library for phylogenetic computing". Bioinformatics 26: 1569-1571. (for all SATé versions from this website)
 
* Sukumaran, J. and Mark T. Holder. 2010. "DendroPy: A Python library for phylogenetic computing". Bioinformatics 26: 1569-1571. (for all SATé versions from this website)
  
 +
===External tool citations===
 +
Please remember to cite the aligner and tree inference tools that you use during the course of a SATé run. The exact citation will depend on what tools you choose to use:
 +
* Mafft: See the References section on http://mafft.cbrc.jp/alignment/software/
 +
* RAxML: See the Publications section on http://wwwkramer.in.tum.de/exelixis/publications.html
 +
* Opal: Wheeler, T.J. and Kececioglu, J.D. Multiple alignment by aligning alignments. ''Proceedings of the 15th ISCB Conference on Intelligent Systems for Molecular Biology, Bioinformatics'' '''23''', i559-i568, 2007. And see http://opal.cs.arizona.edu/
 +
* Muscle: Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. ''Nucleic Acids Res.'' '''32(5):'''1792-1797. doi:10.1093/nar/gkh340. Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. ''BMC Bioinformatics'', '''(5)''' 113. doi:10.1186/1471-2105-5-113.  See http://www.drive5.com/muscle/
 +
* Clustal: See the References section of ftp://ftp.ebi.ac.uk/pub/software/clustalw2/clustalx_help.html
 +
* Prank: See http://www.ebi.ac.uk/goldman-srv/prank/prank
 +
* FastTree: Price MN, Dehal PS, Arkin AP. (2010) FastTree 2: Approximately Maximum-Likelihood Trees for Large Alignments. ''PLoS ONE'' '''5(3)''': e9490. doi:10.1371/journal.pone.0009490.
 
|}}
 
|}}
 
<!--Installation-->
 
<!--Installation-->

Revision as of 16:15, 22 July 2022

Description

sate website  

SATé is a software package for inferring a sequence alignment and phylogenetic tree. The iterative algorithm involves repeated alignment and tree searching operations. The original data set is divided into smaller subproblems by a tree-based decomposition. These subproblems are aligned and further merged for phylogenetic tree inference. For more information, please refer to the recent publication of Liu et al.

The implementation developed in University of Kansas is written by Jiaye Yu, Mark Holder, Jeet Sukumaran, and Siavash Mirarab. By default, this implementation uses the "SATe-II fast" settings. The primary difference is the use of recursive CT-1 instead of CT-5 decomposition described in the original Liu et al. paper.

The alignment and tree searching routines are implemented by calling "external" programs not written by us (but are bundled with the SATé distribution).

Currently, the following tools are supported, and are bundled with the SATe distribution:

  • ClustalW 2.0.12
  • MAFFT 6.717
  • MUSCLE 3.7
  • OPAL 1.0.3
  • PRANK 100311
  • RAxML 7.2.6
  • FastTree 2.1.4

Environment Modules

Run module spider sate to find out what environment modules are available for this application.

System Variables

  • HPC_SATE_DIR - installation directory
  • HPC_SATE_BIN - executable directory

Additional Information

Note: By default, SATe uses your home directory as the location for temporary files. It is a violation of HPC policy for jobs to write to your home directory. It is critical that you include the --temporaries= flag in your SATe command line to provide an alternative path for the temp files. PBS provides the $TMPDIR variable for you, and this is an excellent option. See example submission script. Another convenient variable you could use is $PBS_O_WORKDIR, something like --temporaries=$PBS_O_WORKDIR/temp would work well too.

For all command line options, run:

run_sate.py -h



Citation

If you use the software in a publication, please cite the software, the papers describing the method, and the appropriate citation for the external tools. Algorithm citations

  • Liu, K., S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, 2009. "Rapid and accurate large scale coestimation of sequence alignments and phylogenetic trees." Science, 324(5934), pp. 1561-1564, 19 June 2009, doi: 10.1126/science.1171243
  • Liu, K., T.J. Warnow, M.T. Holder, S. Nelesen, J. Yu, A. Stamatakis, and C.R. Linder. "SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees." Systematic Biology. 61(1):90-106

Citations for the SATé software itself and its dependencies

  • Jiaye Yu, and Mark T. Holder "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2 or earlier)
  • Jiaye Yu, Mark T. Holder, Jeet Sukumaran, and Siavash Mirarab "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2.1 to 2.1.0)
  • Jiaye Yu, Mark T. Holder, Jeet Sukumaran, Siavash Mirarab, and Jamie Oaks "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 2.2.0 or later)
  • Sukumaran, J. and Mark T. Holder. 2010. "DendroPy: A Python library for phylogenetic computing". Bioinformatics 26: 1569-1571. (for all SATé versions from this website)

External tool citations

Please remember to cite the aligner and tree inference tools that you use during the course of a SATé run. The exact citation will depend on what tools you choose to use: