Sate
Description
SATé is a software package for inferring a sequence alignment and phylogenetic tree. The iterative algorithm involves repeated alignment and tree searching operations. The original data set is divided into smaller subproblems by a tree-based decomposition. These subproblems are aligned and further merged for phylogenetic tree inference. For more information, please refer to the recent publication of Liu et al.
The implementation developed in University of Kansas is written by Jiaye Yu, Mark Holder, Jeet Sukumaran, and Siavash Mirarab. By default, this implementation uses the "SATe-II fast" settings. The primary difference is the use of recursive CT-1 instead of CT-5 decomposition described in the original Liu et al. paper.
The alignment and tree searching routines are implemented by calling "external" programs not written by us (but are bundled with the SATé distribution).
Currently, the following tools are supported, and are bundled with the SATe distribution:
- ClustalW 2.0.12
- MAFFT 6.717
- MUSCLE 3.7
- OPAL 1.0.3
- PRANK 100311
- RAxML 7.2.6
- FastTree 2.1.4
Environment Modules
Run module spider sate
to find out what environment modules are available for this application.
System Variables
- HPC_SATE_DIR - installation directory
- HPC_SATE_BIN - executable directory
Additional Information
Note: By default, SATe uses your home directory as the location for temporary files. It is a violation of HPC policy for jobs to write to your home directory. It is critical that you include the --temporaries= flag in your SATe command line to provide an alternative path for the temp files. PBS provides the $TMPDIR variable for you, and this is an excellent option. See example submission script. Another convenient variable you could use is $PBS_O_WORKDIR, something like --temporaries=$PBS_O_WORKDIR/temp would work well too.
For all command line options, run:
run_sate.py -h
Citation
If you use the software in a publication, please cite the software, the papers describing the method, and the appropriate citation for the external tools.
Expand this section to view citation instructions.
Algorithm citations
- Liu, K., S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow, 2009. "Rapid and accurate large scale coestimation of sequence alignments and phylogenetic trees." Science, 324(5934), pp. 1561-1564, 19 June 2009, doi: 10.1126/science.1171243
- Liu, K., T.J. Warnow, M.T. Holder, S. Nelesen, J. Yu, A. Stamatakis, and C.R. Linder. "SATé-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees." Systematic Biology. 61(1):90-106
Citations for the SATé software itself and its dependencies
- Jiaye Yu, and Mark T. Holder "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2 or earlier)
- Jiaye Yu, Mark T. Holder, Jeet Sukumaran, and Siavash Mirarab "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 1.2.1 to 2.1.0)
- Jiaye Yu, Mark T. Holder, Jeet Sukumaran, Siavash Mirarab, and Jamie Oaks "SATé version VERSION_NUMBER_HERE" from http://phylo.bio.ku.edu/software/sate/sate.html DATE DOWNLOADED." (for version 2.2.0 or later)
- Sukumaran, J. and Mark T. Holder. 2010. "DendroPy: A Python library for phylogenetic computing". Bioinformatics 26: 1569-1571. (for all SATé versions from this website)
External tool citations
Please remember to cite the aligner and tree inference tools that you use during the course of a SATé run. The exact citation will depend on what tools you choose to use:
- Mafft: See the References section on http://mafft.cbrc.jp/alignment/software/
- RAxML: See the Publications section on http://wwwkramer.in.tum.de/exelixis/publications.html
- Opal: Wheeler, T.J. and Kececioglu, J.D. Multiple alignment by aligning alignments. Proceedings of the 15th ISCB Conference on Intelligent Systems for Molecular Biology, Bioinformatics 23, i559-i568, 2007. And see http://opal.cs.arizona.edu/
- Muscle: Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. doi:10.1093/nar/gkh340. Edgar, R.C. (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, (5) 113. doi:10.1186/1471-2105-5-113. See http://www.drive5.com/muscle/
- Clustal: See the References section of ftp://ftp.ebi.ac.uk/pub/software/clustalw2/clustalx_help.html
- Prank: See http://www.ebi.ac.uk/goldman-srv/prank/prank
- FastTree: Price MN, Dehal PS, Arkin AP. (2010) FastTree 2: Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490. doi:10.1371/journal.pone.0009490.