Difference between revisions of "Ea-utils"
Jump to navigation
Jump to search
Line 34: | Line 34: | ||
:: Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers). | :: Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers). | ||
<!--Modules--> | <!--Modules--> | ||
− | == | + | ==Environment Modules== |
− | + | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | |
− | |||
− | < | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==System Variables== | ==System Variables== | ||
− | * HPC_{{uc:{{#var:app}}}}_DIR | + | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory |
<!--Configuration--> | <!--Configuration--> | ||
{{#if: {{#var: conf}}|==Configuration== | {{#if: {{#var: conf}}|==Configuration== | ||
Line 84: | Line 74: | ||
<!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | <!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | ||
__NOTOC____NOEDITSECTION__ | __NOTOC____NOEDITSECTION__ | ||
− | |||
− |
Revision as of 14:02, 13 June 2022
Description
ea-utils are command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. They are primarily written to support Illumina based pipelines but should work with any FASTQs.
- Overview
- fastq-mcf
- Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
- fastq-multx
- Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
- fastq-join
- Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
- varcall
- Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
- sam-stats
- Basic sam/bam stats. Like other tools, but produces what I want to look at, in a format suitable for passing to other programs. (View source)
- fastq-stats
- Basic fastq stats. Counts duplicates. Option for per-cycle stats, or not (irrelevant for many sequencers).
Environment Modules
Run module spider eautils
to find out what environment modules are available for this application.
System Variables
- HPC_EAUTILS_DIR - installation directory
Citation
If you publish research that uses eautils you have to cite it as follows:
Erik Aronesty (2011). ea-utils : "Command-line tools for processing biological sequencing data"; http://code.google.com/p/ea-utils
Erik Aronesty (2013). TOBioiJ : "Comparison of Sequencing Utility Programs", DOI:10.2174/1875036201307010001