SuperCRUNCH
Description
SuperCRUNCH is a python toolkit for creating and working with phylogenetic datasets. SuperCRUNCH can be run using any set of sequence data, as long as sequences are in fasta format with standard naming conventions (described here).
SuperCRUNCH can be used to process sequences downloaded directly from GenBank/NCBI, local sequence data (e.g. sequences not downloaded from GenBank, such as unpublished data), or a combination of both. The sequence data are first parsed into gene-specific fasta files using targeted searches guided by lists of taxon and locus names. For each resulting gene, sequences can be filtered with similarity searches using automated methods or based on user-supplied reference sequences. SuperCRUNCH offers the option to select a best representative sequence for each taxon, or to retain all filtered sequences for each taxon. These options allow the user to generate species-level supermatrix datasets (one sequence per species per locus) or population-level datasets (multiple sequences per species per locus). In addition, SuperCRUNCH can identify voucher codes present in sequence records and link samples in phylogeographic datasets through correct labeling. SuperCRUNCH offers important pre-alignment steps (adjust sequence directions, adjust reading frames), several options for sequence alignment (Clustal-O, MAFFT, Muscle, MACSE), and multiple options for alignment trimming. Finally, SuperCRUNCH can be used for rapid file format conversion and concatenation.
Environment Modules
Run module spider supercrunch
to find out what environment modules are available for this application.
System Variables
- HPC_SUPERCRUNCH_DIR - installation directory
- HPC_SUPERCRUNCH_BIN - executable directory
- HPC_SUPERCRUNCH_DOC - documentation directory
- HPC_SUPERCRUNCH_DAT - data directory
Citation
If you publish research that uses supercrunch you have to cite it as follows: