Difference between revisions of "Uproc"

From UFRC
Jump to navigation Jump to search
Line 1: Line 1:
[[Category:Software]][[Category:biology]][[Category:genomics]]
+
[[Category:Software]][[Category:Genomics]][[Category:Bioinformatics]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|uproc}}
 
|{{#vardefine:app|uproc}}
Line 5: Line 5:
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
+
|{{#vardefine:exe|1}}            <!--ADDITIONAL INFO-->
|{{#vardefine:job|}}            <!--JOB SCRIPTS-->
+
|{{#vardefine:pbs|}}            <!--PBS SCRIPTS-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
|{{#vardefine:citation|}}      <!--CITATION-->
+
|{{#vardefine:citation|1}}      <!--CITATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|}
 
|}
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics. The ultrafast protein classification (UProC) toolbox implements a novel algorithm ("Mosaic Matching") for large-scale sequence analysis and is now available in terms of an open source C library. UProC is up to three orders of magnitude faster than profile-based methods and achieved up to 80% higher sensitivity on unassembled short reads (100 bp) from simulated metagenomes. UProC does not depend on a multiple alignment of family-specific sequences. Therefore, in addition to the protein domain classfication according to the Pfam database, UProC can, in principle, also provide the detection of KEGG Orthologs. We provide a precompiled database for KEGG Ortholog classification which we applied to the prediction of functional repertoires from short reads.
+
With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics. The ultrafast protein classification (UProC) toolbox implements a novel algorithm ("Mosaic Matching") for large-scale sequence analysis and is now available in terms of an open source C library. UProC is up to three orders of magnitude faster than profile-based methods and achieved up to 80% higher sensitivity on unassembled short reads (100 bp) from simulated metagenomes. UProC does not depend on a multiple alignment of family-specific sequences. Therefore, in addition to the protein domain classfication according to the Pfam database, UProC can, in principle, also provide the detection of KEGG Orthologs. We provide a precompiled database for KEGG Ortholog classification (see below) but we have not evaluated the classification performance for that database so far.
  
 
<!--Modules-->
 
<!--Modules-->
==Environment Modules==
+
==Required Modules==
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
+
===Serial===
 +
* {{#var:app}}
 +
<!--
 +
===Parallel (OpenMP)===
 +
* intel
 +
* {{#var:app}}
 +
===Parallel (MPI)===
 +
* intel
 +
* openmpi
 +
* {{#var:app}}
 +
-->
 
==System Variables==
 
==System Variables==
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
Line 28: Line 38:
 
* HPC_{{uc:{{#var:app}}}}_LIB - library directory
 
* HPC_{{uc:{{#var:app}}}}_LIB - library directory
 
* HPC_{{uc:{{#var:app}}}}_INC - includes directory
 
* HPC_{{uc:{{#var:app}}}}_INC - includes directory
 
 
<!--Configuration-->
 
<!--Configuration-->
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
Line 36: Line 45:
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
  
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
+
Imported UProC KeGG and PFam databases are located at $DBDR (or $HPC_UPROC_DBDIR) and $MODELDIR (or $HPC_UPROC_MODELDIR), so the above variables can be used on the command-line.
  
 
|}}
 
|}}
<!--Job Scripts-->
+
<!--PBS scripts-->
{{#if: {{#var: job}}|==Job Script Examples==
+
{{#if: {{#var: pbs}}|==PBS Script Examples==
See the [[{{PAGENAME}}_Job_Scripts]] page for {{#var: app}} Job script examples.
+
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.
 
|}}
 
|}}
 
<!--Policy-->
 
<!--Policy-->
Line 62: Line 71:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
  
WRITE_CITATION_HERE
+
Meinicke, Peter. '''UProC: tools for ultra-fast protein domain classification'''. ''Bioinformatics'', 2014
  
 
|}}
 
|}}
Line 70: Line 79:
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
__NOTOC____NOEDITSECTION__
 
__NOTOC____NOEDITSECTION__
 +
=Validation=
 +
* Validate 4/5/2018

Revision as of 20:10, 27 May 2022

Description

uproc website  

With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics. The ultrafast protein classification (UProC) toolbox implements a novel algorithm ("Mosaic Matching") for large-scale sequence analysis and is now available in terms of an open source C library. UProC is up to three orders of magnitude faster than profile-based methods and achieved up to 80% higher sensitivity on unassembled short reads (100 bp) from simulated metagenomes. UProC does not depend on a multiple alignment of family-specific sequences. Therefore, in addition to the protein domain classfication according to the Pfam database, UProC can, in principle, also provide the detection of KEGG Orthologs. We provide a precompiled database for KEGG Ortholog classification (see below) but we have not evaluated the classification performance for that database so far.

Required Modules

Serial

  • uproc

System Variables

  • HPC_UPROC_DIR - installation directory
  • HPC_UPROC_BIN - executable directory
  • HPC_UPROC_LIB - library directory
  • HPC_UPROC_INC - includes directory

Additional Information

Imported UProC KeGG and PFam databases are located at $DBDR (or $HPC_UPROC_DBDIR) and $MODELDIR (or $HPC_UPROC_MODELDIR), so the above variables can be used on the command-line.



Citation

If you publish research that uses uproc you have to cite it as follows:

Meinicke, Peter. UProC: tools for ultra-fast protein domain classification. Bioinformatics, 2014


Validation

  • Validate 4/5/2018