Ea-utils: Difference between revisions
Jump to navigation
Jump to search
Moskalenko (talk | contribs) m Text replacement - "#uppercase" to "uc" |
No edit summary |
||
Line 1: | Line 1: | ||
[[Category:Software]][[Category: | [[Category:Software]][[Category:Bioinformatics]][[Category:NGS]] | ||
{|<!--CONFIGURATION: REQUIRED--> | {|<!--CONFIGURATION: REQUIRED--> | ||
|{{#vardefine:app|eautils}} | |{{#vardefine:app|eautils}} | ||
Line 6: | Line 6: | ||
|{{#vardefine:conf|}} <!--CONFIGURATION--> | |{{#vardefine:conf|}} <!--CONFIGURATION--> | ||
|{{#vardefine:exe|}} <!--ADDITIONAL INFO--> | |{{#vardefine:exe|}} <!--ADDITIONAL INFO--> | ||
|{{#vardefine: | |{{#vardefine:pbs|}} <!--PBS SCRIPTS--> | ||
|{{#vardefine:policy|}} <!--POLICY--> | |{{#vardefine:policy|}} <!--POLICY--> | ||
|{{#vardefine:testing|}} <!--PROFILING--> | |{{#vardefine:testing|}} <!--PROFILING--> | ||
|{{#vardefine:faq|}} <!--FAQ--> | |{{#vardefine:faq|}} <!--FAQ--> | ||
|{{#vardefine:citation|}} <!--CITATION--> | |{{#vardefine:citation|1}} <!--CITATION--> | ||
|{{#vardefine:installation|}} <!--INSTALLATION--> | |{{#vardefine:installation|}} <!--INSTALLATION--> | ||
|} | |} | ||
Line 18: | Line 18: | ||
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | ||
ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs. | |||
;Overview: | |||
: fastq-mcf | |||
:: Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering. | |||
: fastq-multx | |||
:: Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not. | |||
: fastq-join | |||
:: Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools. | |||
: varcall | |||
:: Takes a pileup and calculates variants in a more easily parameterized manner than some other tools. | |||
: sam-stats | |||
:: Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source) | |||
: fastq-stats | |||
:: Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers). | |||
<!--Modules--> | <!--Modules--> | ||
== | ==Required Modules== | ||
===Serial=== | |||
* {{#var:app}} | |||
<!-- | |||
===Parallel (OpenMP)=== | |||
* intel | |||
* {{#var:app}} | |||
===Parallel (MPI)=== | |||
* intel | |||
* openmpi | |||
* {{#var:app}} | |||
--> | |||
==System Variables== | ==System Variables== | ||
* HPC_{{uc:{{#var:app}}}}_DIR | * HPC_{{uc:{{#var:app}}}}_DIR | ||
<!--Configuration--> | <!--Configuration--> | ||
{{#if: {{#var: conf}}|==Configuration== | {{#if: {{#var: conf}}|==Configuration== | ||
Line 31: | Line 54: | ||
<!--Run--> | <!--Run--> | ||
{{#if: {{#var: exe}}|==Additional Information== | {{#if: {{#var: exe}}|==Additional Information== | ||
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY | WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY | ||
|}} | |}} | ||
<!-- | <!--PBS scripts--> | ||
{{#if: {{#var: | {{#if: {{#var: pbs}}|==PBS Script Examples== | ||
See the [[{{PAGENAME}} | See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples. | ||
|}} | |}} | ||
<!--Policy--> | <!--Policy--> | ||
{{#if: {{#var: policy}}|==Usage Policy== | {{#if: {{#var: policy}}|==Usage Policy== | ||
WRITE USAGE POLICY HERE (Licensing, usage, access). | WRITE USAGE POLICY HERE (Licensing, usage, access). | ||
|}} | |}} | ||
<!--Performance--> | <!--Performance--> | ||
{{#if: {{#var: testing}}|==Performance== | {{#if: {{#var: testing}}|==Performance== | ||
WRITE_PERFORMANCE_TESTING_RESULTS_HERE | WRITE_PERFORMANCE_TESTING_RESULTS_HERE | ||
|}} | |}} | ||
<!--Faq--> | <!--Faq--> | ||
Line 58: | Line 75: | ||
If you publish research that uses {{#var:app}} you have to cite it as follows: | If you publish research that uses {{#var:app}} you have to cite it as follows: | ||
Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils | |||
Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001 | |||
|}} | |}} | ||
<!--Installation--> | <!--Installation--> | ||
Line 66: | Line 84: | ||
<!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | <!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | ||
__NOTOC____NOEDITSECTION__ | __NOTOC____NOEDITSECTION__ | ||
=Validation= | |||
* Validated 4/5/2018 |
Revision as of 16:29, 27 May 2022
Description
ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs.
- Overview
- fastq-mcf
- Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
- fastq-multx
- Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
- fastq-join
- Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
- varcall
- Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
- sam-stats
- Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source)
- fastq-stats
- Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers).
Required Modules
Serial
- eautils
System Variables
- HPC_EAUTILS_DIR
Citation
If you publish research that uses eautils you have to cite it as follows:
Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils
Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001
Validation
- Validated 4/5/2018