Difference between revisions of "Sailfish"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:SoftwareCategory:BioinformaticsCategory:NGS {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|sailfish}} |{{#vardefine:url|http://www.cs.cmu.edu/~ckingsf...")
 
Line 71: Line 71:
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
__NOTOC____NOEDITSECTION__
 
__NOTOC____NOEDITSECTION__
 +
=Validation=
 +
* Validated 4/5/2018

Revision as of 18:49, 5 April 2018

Description

sailfish website  

RNA-seq expression estimates need not take longer than a cup of coffee

The quantification of gene or isoform abundance is a fundamental step in many transcriptome analysis tasks, such as determining differential expression between biological samples. Yet, estimating isoform abundance from a large set of RNA-seq reads remains a computationally intensive task, owing in large part to the necessity of read mapping. To address this problem directly, we developed Sailfish, a software tool that implements a novel, alignment-free algorithm for the estimation of isoform abundances directly from a set of reference sequences and RNA-seq reads. Rather than working at the read level, the fundamental unit of transcript coverage in Sailfish is the k-mer. Implementing this alternative, lightweight, approach allows Sailfish to dispense with many of the complexities of read mapping while remaining robust to sequencing errors. By replacing read mapping with intelligent k-mer indexing and counting, Sailfish is able to quantify isoform abundance orders of magnitude faster than existing tools. For example, it takes about 15 minutes for a set of 150 million reads where existing tools take over 6 hours.

This increase in speed is obtained without sacrificing accuracy. Sailfish implements an efficient, accelerated expectation-maximization algorithm for quantifying isoform abundance that produces high-quality results, and is capable of correcting numerous types of systematic bias that are known to occur in RNA-seq experiments. In the paper, we demonstrate that, on both real and synthetic data, Sailfish is as accurate as existing read mapping-based tools such as eXpress and Cufflinks.

Required Modules

Serial

  • sailfish

System Variables

  • HPC_{{#uppercase:sailfish}}_DIR





Validation

  • Validated 4/5/2018