Difference between revisions of "Ea-utils"
Jump to navigation
Jump to search
Moskalenko (talk | contribs) |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category:Software]][[Category:Biology]] | + | [[Category:Software]][[Category:Biology]][[Category:NGS]][[Category:Genomics]] |
{|<!--CONFIGURATION: REQUIRED--> | {|<!--CONFIGURATION: REQUIRED--> | ||
|{{#vardefine:app|eautils}} | |{{#vardefine:app|eautils}} | ||
Line 6: | Line 6: | ||
|{{#vardefine:conf|}} <!--CONFIGURATION--> | |{{#vardefine:conf|}} <!--CONFIGURATION--> | ||
|{{#vardefine:exe|}} <!--ADDITIONAL INFO--> | |{{#vardefine:exe|}} <!--ADDITIONAL INFO--> | ||
− | |{{#vardefine: | + | |{{#vardefine:pbs|}} <!--PBS SCRIPTS--> |
|{{#vardefine:policy|}} <!--POLICY--> | |{{#vardefine:policy|}} <!--POLICY--> | ||
|{{#vardefine:testing|}} <!--PROFILING--> | |{{#vardefine:testing|}} <!--PROFILING--> | ||
|{{#vardefine:faq|}} <!--FAQ--> | |{{#vardefine:faq|}} <!--FAQ--> | ||
− | |{{#vardefine:citation|}} <!--CITATION--> | + | |{{#vardefine:citation|1}} <!--CITATION--> |
|{{#vardefine:installation|}} <!--INSTALLATION--> | |{{#vardefine:installation|}} <!--INSTALLATION--> | ||
|} | |} | ||
Line 18: | Line 18: | ||
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | ||
− | + | ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs. | |
+ | ;Overview: | ||
+ | : fastq-mcf | ||
+ | :: Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering. | ||
+ | : fastq-multx | ||
+ | :: Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not. | ||
+ | : fastq-join | ||
+ | :: Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools. | ||
+ | : varcall | ||
+ | :: Takes a pileup and calculates variants in a more easily parameterized manner than some other tools. | ||
+ | : sam-stats | ||
+ | :: Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source) | ||
+ | : fastq-stats | ||
+ | :: Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers). | ||
<!--Modules--> | <!--Modules--> | ||
==Environment Modules== | ==Environment Modules== | ||
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | ||
==System Variables== | ==System Variables== | ||
− | * HPC_{{ | + | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory |
<!--Configuration--> | <!--Configuration--> | ||
{{#if: {{#var: conf}}|==Configuration== | {{#if: {{#var: conf}}|==Configuration== | ||
Line 31: | Line 44: | ||
<!--Run--> | <!--Run--> | ||
{{#if: {{#var: exe}}|==Additional Information== | {{#if: {{#var: exe}}|==Additional Information== | ||
− | |||
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY | WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY | ||
− | |||
|}} | |}} | ||
− | <!-- | + | <!--PBS scripts--> |
− | {{#if: {{#var: | + | {{#if: {{#var: pbs}}|==PBS Script Examples== |
− | See the [[{{PAGENAME}} | + | See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples. |
|}} | |}} | ||
<!--Policy--> | <!--Policy--> | ||
{{#if: {{#var: policy}}|==Usage Policy== | {{#if: {{#var: policy}}|==Usage Policy== | ||
− | |||
WRITE USAGE POLICY HERE (Licensing, usage, access). | WRITE USAGE POLICY HERE (Licensing, usage, access). | ||
− | |||
|}} | |}} | ||
<!--Performance--> | <!--Performance--> | ||
{{#if: {{#var: testing}}|==Performance== | {{#if: {{#var: testing}}|==Performance== | ||
− | |||
WRITE_PERFORMANCE_TESTING_RESULTS_HERE | WRITE_PERFORMANCE_TESTING_RESULTS_HERE | ||
− | |||
|}} | |}} | ||
<!--Faq--> | <!--Faq--> | ||
Line 58: | Line 65: | ||
If you publish research that uses {{#var:app}} you have to cite it as follows: | If you publish research that uses {{#var:app}} you have to cite it as follows: | ||
− | + | Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils | |
+ | Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001 | ||
|}} | |}} | ||
<!--Installation--> | <!--Installation--> |
Latest revision as of 14:54, 15 August 2022
Description
ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs.
- Overview
- fastq-mcf
- Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
- fastq-multx
- Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
- fastq-join
- Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
- varcall
- Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
- sam-stats
- Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source)
- fastq-stats
- Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers).
Environment Modules
Run module spider eautils
to find out what environment modules are available for this application.
System Variables
- HPC_EAUTILS_DIR - installation directory
Citation
If you publish research that uses eautils you have to cite it as follows:
Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils
Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001