OrthoMCL

From UFRC
Revision as of 20:46, 15 April 2013 by Moskalenko (talk | contribs)
Jump to navigation Jump to search

Description

orthomcl website  

OrthoMCL is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families. So it serves as an important utility for automated eukaryotic genome annotation. OrthoMCL starts with reciprocal best hits within each genome as potential in-paralog/recent paralog pairs and reciprocal best hits across any two genomes as potential ortholog pairs. Related proteins are interlinked in a similarity graph. Then MCL (Markov Clustering algorithm,Van Dongen 2000; www.micans.org/mcl) is invoked to split mega-clusters. This process is analogous to the manual review in COG construction. MCL clustering is based on weights between each pair of proteins, so to correct for differences in evolutionary distance the weights are normalized before running MCL.

Required Modules

modules documentation

Serial

  • orthomcl

System Variables

  • HPC_{{#uppercase:orthomcl}}_DIR - installation directory
  • HPC_ORTHOMCL_BIN - executable directory
  • HPC_ORTHOMCL_CONF - configuration directory. It contains the orthomcl.config file that provides the MySQL connection settings, credentials, and the database structure.

How To Run

For a given OrthoMCL version the /apps/orthomcl/VERSION/config/orthomcl.config file provides a template for configuring an OrthoMCL database. Change the database name from 'orthomcl', which is an existing database containing the data from OrthoMCL authors to your own database that has a name with a pattern 'project_user_orthomcl' in the string 'dbConnectString=dbi:mysql:orthomcl:10.13.20.209'. MySQL server will allow the orthomcl user to create databases with that name pattern.

OrthoMCL documentation is available from several sources.

  • A User Guide and other documents in text format are located in /apps/orthomcl/VERSION/doc/OrthoMCLEngine/Main/ and at [1].



Citation

If you publish research that uses orthomcl you have to cite it as follows:

  1. Feng Chen, Aaron J. Mackey, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006 34: D363-8. Please cite this paper if you publish research results benefited from OrthoMCL-DB.
  2. Li Li, Christian J. Stoeckert, Jr., and David S. Roos. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 2003 13: 2178-2189.
  3. Feng Chen, Aaron J. Mackey, Jeroen K. Vermunt, and David S. Roos. Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2007 2(4): e383.