Difference between revisions of "Sailfish"
Moskalenko (talk | contribs) (Created page with "Category:SoftwareCategory:BioinformaticsCategory:NGS {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|sailfish}} |{{#vardefine:url|http://www.cs.cmu.edu/~ckingsf...") |
|||
Line 71: | Line 71: | ||
<!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | <!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | ||
__NOTOC____NOEDITSECTION__ | __NOTOC____NOEDITSECTION__ | ||
+ | =Validation= | ||
+ | * Validated 4/5/2018 |
Revision as of 18:49, 5 April 2018
Description
RNA-seq expression estimates need not take longer than a cup of coffee
The quantification of gene or isoform abundance is a fundamental step in many transcriptome analysis tasks, such as determining differential expression between biological samples. Yet, estimating isoform abundance from a large set of RNA-seq reads remains a computationally intensive task, owing in large part to the necessity of read mapping. To address this problem directly, we developed Sailfish, a software tool that implements a novel, alignment-free algorithm for the estimation of isoform abundances directly from a set of reference sequences and RNA-seq reads. Rather than working at the read level, the fundamental unit of transcript coverage in Sailfish is the k-mer. Implementing this alternative, lightweight, approach allows Sailfish to dispense with many of the complexities of read mapping while remaining robust to sequencing errors. By replacing read mapping with intelligent k-mer indexing and counting, Sailfish is able to quantify isoform abundance orders of magnitude faster than existing tools. For example, it takes about 15 minutes for a set of 150 million reads where existing tools take over 6 hours.
This increase in speed is obtained without sacrificing accuracy. Sailfish implements an efficient, accelerated expectation-maximization algorithm for quantifying isoform abundance that produces high-quality results, and is capable of correcting numerous types of systematic bias that are known to occur in RNA-seq experiments. In the paper, we demonstrate that, on both real and synthetic data, Sailfish is as accurate as existing read mapping-based tools such as eXpress and Cufflinks.
Required Modules
Serial
- sailfish
System Variables
- HPC_{{#uppercase:sailfish}}_DIR
Validation
- Validated 4/5/2018