Difference between revisions of "Ea-utils"

From UFRC
Jump to navigation Jump to search
m (Text replacement - "#uppercase" to "uc")
Line 1: Line 1:
[[Category:Software]][[Category:Biology]]
+
[[Category:Software]][[Category:Bioinformatics]][[Category:NGS]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|eautils}}
 
|{{#vardefine:app|eautils}}
Line 6: Line 6:
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
 
|{{#vardefine:conf|}}          <!--CONFIGURATION-->
 
|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
 
|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
|{{#vardefine:job|}}            <!--JOB SCRIPTS-->
+
|{{#vardefine:pbs|}}            <!--PBS SCRIPTS-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:policy|}}        <!--POLICY-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:testing|}}      <!--PROFILING-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
 
|{{#vardefine:faq|}}            <!--FAQ-->
|{{#vardefine:citation|}}      <!--CITATION-->
+
|{{#vardefine:citation|1}}      <!--CITATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
|}
 
|}
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
EAUtils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support an Illumina based pipeline - but should work with any FASTQs.
+
ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs.
  
 +
;Overview:
 +
: fastq-mcf
 +
:: Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
 +
: fastq-multx
 +
:: Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
 +
: fastq-join
 +
:: Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
 +
: varcall
 +
:: Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
 +
: sam-stats
 +
:: Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source)
 +
: fastq-stats
 +
:: Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers).
 
<!--Modules-->
 
<!--Modules-->
==Environment Modules==
+
==Required Modules==
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
+
===Serial===
 +
* {{#var:app}}
 +
<!--
 +
===Parallel (OpenMP)===
 +
* intel
 +
* {{#var:app}}
 +
===Parallel (MPI)===
 +
* intel
 +
* openmpi
 +
* {{#var:app}}
 +
-->
 
==System Variables==
 
==System Variables==
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
+
* HPC_{{uc:{{#var:app}}}}_DIR
 
<!--Configuration-->
 
<!--Configuration-->
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
Line 31: Line 54:
 
<!--Run-->
 
<!--Run-->
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
 
 
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
 
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
 
 
|}}
 
|}}
<!--Job Scripts-->
+
<!--PBS scripts-->
{{#if: {{#var: job}}|==Job Script Examples==
+
{{#if: {{#var: pbs}}|==PBS Script Examples==
See the [[{{PAGENAME}}_Job_Scripts]] page for {{#var: app}} Job script examples.
+
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.
 
|}}
 
|}}
 
<!--Policy-->
 
<!--Policy-->
 
{{#if: {{#var: policy}}|==Usage Policy==
 
{{#if: {{#var: policy}}|==Usage Policy==
 
 
WRITE USAGE POLICY HERE (Licensing, usage, access).
 
WRITE USAGE POLICY HERE (Licensing, usage, access).
 
 
|}}
 
|}}
 
<!--Performance-->
 
<!--Performance-->
 
{{#if: {{#var: testing}}|==Performance==
 
{{#if: {{#var: testing}}|==Performance==
 
 
WRITE_PERFORMANCE_TESTING_RESULTS_HERE
 
WRITE_PERFORMANCE_TESTING_RESULTS_HERE
 
 
|}}
 
|}}
 
<!--Faq-->
 
<!--Faq-->
Line 58: Line 75:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
  
WRITE_CITATION_HERE
+
Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils
  
 +
Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001
 
|}}
 
|}}
 
<!--Installation-->
 
<!--Installation-->
Line 66: Line 84:
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
__NOTOC____NOEDITSECTION__
 
__NOTOC____NOEDITSECTION__
 +
=Validation=
 +
* Validated 4/5/2018

Revision as of 16:29, 27 May 2022

Description

eautils website  

ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs.

Overview
fastq-mcf
Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
fastq-multx
Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
fastq-join
Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
varcall
Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
sam-stats
Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source)
fastq-stats
Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers).

Required Modules

Serial

  • eautils

System Variables

  • HPC_EAUTILS_DIR




Citation

If you publish research that uses eautils you have to cite it as follows:

Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils

Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001


Validation

  • Validated 4/5/2018