Difference between revisions of "CD-HIT"

Latest revision as of 18:24, 12 August 2022

Description

CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output. In addition cd-hit outputs a cluster file, documenting the sequence 'groupies' for each nr sequence representative. The idea is to reduce the overall size of the database without removing any sequence information by only removing 'redundant' (or highly similar) sequences. This is why the resulting database is called non-redundant (nr). Essentially, cd-hit produces a set of closely related protein families from a given fasta sequence database.

Environment Modules

Run module spider cdhit to find out what environment modules are available for this application.

System Variables

HPC_CDHIT_DIR - installation directory
HPC_CDHIT_BIN - executable directory.
HPC_CDHIT_DOC - documentation directory.

Additional Information

OpenMP binaries have the same names as serial binaries, but have the "-omp" suffix added to their names.

@@ Line 1: / Line 1: @@
 __NOTOC__
 __NOEDITSECTION__
-[[Category:Software]][[Category:Bioinformatics]][[Category:Genomics]]
+[[Category:Software]][[Category:Biology]][[Category:Genomics]]
-<!-- ########  Template Configuration ######## -->
+{|<!--Main settings - REQUIRED-->
-<!--Edit definitions of the variables used in template calls
-Required variables:
-app - lowercase name of the application e.g. "amber"
-url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
-Optional variables:
-INTEL - Version of the Intel Compiler e.g. "11.1"
-MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
--->
-{|
-<!--Main settings - REQUIRED-->
 |{{#vardefine:app|cdhit}}
 |{{#vardefine:url|http://weizhong-lab.ucsd.edu/cd-hit/}}
-<!--Compiler and MPI settings - OPTIONAL -->
+|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
-|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
-|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
-<!--Choose sections to enable - OPTIONAL-->
-|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
-|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
 |{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
@@ Line 31: / Line 16: @@
 <!--Description-->
 {{#if: {{#var: url}}|
-{{App_Description|app={{#var:app}}|url={{#var:url}}}}|}}
+{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output. In addition cd-hit outputs a cluster file, documenting the sequence 'groupies' for each nr sequence representative. The idea is to reduce the overall size of the database without removing any sequence information by only removing 'redundant' (or highly similar) sequences. This is why the resulting database is called non-redundant (nr). Essentially, cd-hit produces a set of closely related protein families from a given fasta sequence database.
-<!--Location-->
-{{App_Location|app={{#var:app}}|{{#var:ver}}}}
-==Available versions==
-* 4.5.7 (serial and openmp).
-<!-- -->
-{{#if: {{#var: mod}}|==Running the application using modules==
-{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
-To make the OpenMP version of CD-HIT available load the following modules:
+<!--Modules-->
- module load cdhitmp
+==Environment Modules==
-{{#if: {{#var: exe}}|==How To Run==
+Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
-WRITE INSTRUCTIONS ON RUNNING THE ACTUAL BINARY|}}
+==System Variables==
+* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
+* HPC_CDHIT_BIN - executable directory.
+* HPC_CDHIT_DOC - documentation directory.
+<!--Additional-->
+{{#if: {{#var: exe}}|==Additional Information==
+OpenMP binaries have the same names as serial binaries, but have the "-omp" suffix added to their names.
+}}
 {{#if: {{#var: conf}}|==Configuration==
 See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
 {{#if: {{#var: pbs}}|==PBS Script Examples==
 See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
-{{#if: {{#var: policy}}|==Usage policy==
+{{#if: {{#var: policy}}|==Usage Policy==
 WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 {{#if: {{#var: testing}}|==Performance==