Difference between revisions of "OrthoMCL"
Moskalenko (talk | contribs) (Created page with "__NOTOC__ __NOEDITSECTION__ Category:SoftwareCategory:BioinformaticsCategory:Genomics <!-- ######## Template Configuration ######## --> <!--Edit definitions of the v...") |
|||
(35 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
__NOTOC__ | __NOTOC__ | ||
__NOEDITSECTION__ | __NOEDITSECTION__ | ||
− | [[Category:Software]][[Category: | + | [[Category:Software]][[Category:Biology]][[Category:Genomics]] |
− | + | {|<!--Main settings - REQUIRED--> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | {| | ||
− | <!--Main settings - REQUIRED--> | ||
|{{#vardefine:app|orthomcl}} | |{{#vardefine:app|orthomcl}} | ||
|{{#vardefine:url|http://orthomcl.org/}} | |{{#vardefine:url|http://orthomcl.org/}} | ||
− | + | |{{#vardefine:exe|1}} <!--Present manual instructions for running the software --> | |
− | |{{#vardefine: | ||
− | |||
− | |||
− | |||
− | |||
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF--> | |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF--> | ||
|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link--> | |{{#vardefine:pbs|}} <!--Enable PBS script wiki page link--> | ||
Line 26: | Line 11: | ||
|{{#vardefine:testing|}} <!--Enable performance testing/profiling section --> | |{{#vardefine:testing|}} <!--Enable performance testing/profiling section --> | ||
|{{#vardefine:faq|}} <!--Enable FAQ section --> | |{{#vardefine:faq|}} <!--Enable FAQ section --> | ||
− | |{{#vardefine:citation|}} <!--Enable Reference/Citation section --> | + | |{{#vardefine:citation|1}} <!--Enable Reference/Citation section --> |
|} | |} | ||
<!-- ######## Template Body ######## --> | <!-- ######## Template Body ######## --> | ||
<!--Description--> | <!--Description--> | ||
{{#if: {{#var: url}}| | {{#if: {{#var: url}}| | ||
− | {{App_Description|app={{#var:app}}|url={{#var:url}}}}|}} | + | {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} |
+ | |||
OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL. | OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL. | ||
− | <!-- | + | |
− | + | <!--Modules--> | |
− | == | + | ==Environment Modules== |
− | + | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | |
− | < | + | ==System Variables== |
− | + | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory | |
− | {{ | ||
* HPC_ORTHOMCL_BIN - executable directory | * HPC_ORTHOMCL_BIN - executable directory | ||
+ | * HPC_ORTHOMCL_CONF - configuration directory. It contains template orthomcl.config files for SQLite and MySQL connection settings, credentials, and database structures. | ||
{{#if: {{#var: exe}}|==How To Run== | {{#if: {{#var: exe}}|==How To Run== | ||
− | + | ===Configuration=== | |
+ | ;Note | ||
+ | : Only SQLite is supported as an OrthoMCL database out of the box at this time unless you attempt to stand up your own MariaDB database in another job using a container on your own. | ||
+ | |||
+ | The '<code>$HPC_ORTHOMCL_CONF/orthomcl.config</code>' file provides a template for configuring an OrthoMCL SQLite database. Copy it to your working directory with | ||
+ | $ cp $HPC_ORTHOMCL_CONF/orthomcl.config . | ||
+ | |||
+ | The '<code>$HPC_ORTHOMCL_CONF/orthomcl.config.sql</code>' file provides a template for configuring an OrthoMCL MySQL database. Copy it to your working directory with | ||
+ | $ cp $HPC_ORTHOMCL_CONF/orthomcl.config.sql . | ||
+ | |||
+ | Don't forget to create the schema with the following '''before''' creating a database: | ||
+ | orthomclInstallSchema orthomcl.config | ||
+ | |||
+ | |||
+ | A wrapper script -- '<code>orthomcl_wrapper_SQLITE</code>' -- that creates a suitable orthomcl.config file and automates much of the pipeline is provided: | ||
+ | |||
+ | $ orthomcl_wrapper_SQLITE -p compliantFASTA -u 2 -s 100 | ||
+ | -p path to folder with input fasta files | ||
+ | -u position of unique IDs in fasta headers | ||
+ | -s number of parallel blast jobs to run in a slurm array. | ||
+ | |||
+ | |||
+ | The wrapper script writes a '<code>run_split_blast_array.sh</code>' script that can be modified to resubmit blast jobs if necessary, and a '<code>finish.orthomcl.sh</code>' script to be run after all blast jobs complete. | ||
+ | |||
+ | |||
+ | * Documentation for SQLite support can be found at https://github.com/stajichlab/OrthoMCL. | ||
+ | |||
+ | * OrthoMCL User Guide can be found at [http://onlinelibrary.wiley.com/doi/10.1002/0471250953.bi0612s35/full Wiley Online Library]. | ||
+ | |||
+ | * The original publications that describe the functionality of the software are [http://dx.doi.org/10.1101%2Fgr.1224503 10.1101/gr.1224503] and [http://dx.doi.org/10.1093%2Fnar%2Fgkj123 10.1093/nar/gkj123]. | ||
+ | |||
+ | |||
+ | |||
+ | |}} | ||
{{#if: {{#var: conf}}|==Configuration== | {{#if: {{#var: conf}}|==Configuration== | ||
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}} | See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}} | ||
{{#if: {{#var: pbs}}|==PBS Script Examples== | {{#if: {{#var: pbs}}|==PBS Script Examples== | ||
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}} | See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}} | ||
− | {{#if: {{#var: policy}}|==Usage | + | {{#if: {{#var: policy}}|==Usage Policy== |
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}} | WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}} | ||
{{#if: {{#var: testing}}|==Performance== | {{#if: {{#var: testing}}|==Performance== | ||
Line 54: | Line 73: | ||
*'''Q:''' **'''A:'''|}} | *'''Q:''' **'''A:'''|}} | ||
{{#if: {{#var: citation}}|==Citation== | {{#if: {{#var: citation}}|==Citation== | ||
− | If you publish research that uses {{ | + | If you publish research that uses {{#var: app}} you have to cite it as follows: |
− | + | #Feng Chen, Aaron J. Mackey, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006 34: D363-8. '''Please cite this paper if you publish research results benefited from OrthoMCL-DB.''' | |
+ | #Li Li, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003 13: 2178-2189. | ||
+ | #Feng Chen, Aaron J. Mackey, Jeroen K. Vermunt, and David S. Roos. Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2007 2(4): e383. | ||
|}} | |}} |
Latest revision as of 22:40, 28 January 2024
Description
OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.
Environment Modules
Run module spider orthomcl
to find out what environment modules are available for this application.
System Variables
- HPC_ORTHOMCL_DIR - installation directory
- HPC_ORTHOMCL_BIN - executable directory
- HPC_ORTHOMCL_CONF - configuration directory. It contains template orthomcl.config files for SQLite and MySQL connection settings, credentials, and database structures.
How To Run
Configuration
- Note
- Only SQLite is supported as an OrthoMCL database out of the box at this time unless you attempt to stand up your own MariaDB database in another job using a container on your own.
The '$HPC_ORTHOMCL_CONF/orthomcl.config
' file provides a template for configuring an OrthoMCL SQLite database. Copy it to your working directory with
$ cp $HPC_ORTHOMCL_CONF/orthomcl.config .
The '$HPC_ORTHOMCL_CONF/orthomcl.config.sql
' file provides a template for configuring an OrthoMCL MySQL database. Copy it to your working directory with
$ cp $HPC_ORTHOMCL_CONF/orthomcl.config.sql .
Don't forget to create the schema with the following before creating a database:
orthomclInstallSchema orthomcl.config
A wrapper script -- 'orthomcl_wrapper_SQLITE
' -- that creates a suitable orthomcl.config file and automates much of the pipeline is provided:
$ orthomcl_wrapper_SQLITE -p compliantFASTA -u 2 -s 100 -p path to folder with input fasta files -u position of unique IDs in fasta headers -s number of parallel blast jobs to run in a slurm array.
The wrapper script writes a 'run_split_blast_array.sh
' script that can be modified to resubmit blast jobs if necessary, and a 'finish.orthomcl.sh
' script to be run after all blast jobs complete.
- Documentation for SQLite support can be found at https://github.com/stajichlab/OrthoMCL.
- OrthoMCL User Guide can be found at Wiley Online Library.
- The original publications that describe the functionality of the software are 10.1101/gr.1224503 and 10.1093/nar/gkj123.
Citation
If you publish research that uses orthomcl you have to cite it as follows:
- Feng Chen, Aaron J. Mackey, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006 34: D363-8. Please cite this paper if you publish research results benefited from OrthoMCL-DB.
- Li Li, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003 13: 2178-2189.
- Feng Chen, Aaron J. Mackey, Jeroen K. Vermunt, and David S. Roos. Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2007 2(4): e383.