Metabin

From UFRC
Revision as of 09:37, 20 June 2013 by Magitz (talk | contribs)
Jump to navigation Jump to search

Description

metabin website  

MetaBin: a program for accurate, fast and highly sensitive taxonomic assignments of metagenomic sequences For comprehensive taxonomic binning, we developed the ‘MetaBin’ web server and standalone program for faster and more accurate taxonomic assignment of single and paired-end sequence reads of varying lengths (≥45 bp) obtained from both Sanger and next-generation sequencing platforms. We benchmarked it using both simulated reads (> 1 million) and real metagenomic datasets. MetaBin correctly assigns a higher number of reads to their expected taxonomic lineages with a lower error frequency as compared to other methods. It displays high accuracy (positive predictive value (PPV) ≥99%) along with high sensitivity (≥94%) for various read lengths. In particular, for short Illumina reads (~45-75 bp) it makes about 4% more assignments as compared to its closest competitors with near 100% accuracy when reference genomes are available.

By implementing Blat a faster alignment method as opposed to Blastx (though both options are available), the analysis time is reduced by 50-1000 times, which is comparable or faster than the time taken for analysis by usually faster composition-based methods. This feature makes it practical to use a more accurate and sensitive homology-based approach for high-throughput analysis of large datasets by removing the bottleneck of time required to generate alignments using Blastx. The MetaBin web server allows users to upload their own data, as sequence reads or Blastx output, to carry out taxonomic analysis. It provides several visualization options for constructing a taxonomic tree of the results, and for performing comparative analysis of the taxonomic profiles for multiple metagenomic datasets.

The standalone command line version is installed.

Required Modules

Serial

  • metabin

System Variables

  • HPC_{{#uppercase:metabin}}_DIR

Additional Information

Metabin uses Jim Kent's Blat application as an alignment method that is much faster than Blastx. Unfortunately, Blat has a bug which causes it to crash with large reference databases, like nr (see the Blat wiki page. As a workaround, we suggest dividing large reference databases into multiple files. To implement this, it is necessary to run the prepareinput step of Metabin with the "-b n" option, and run the Blat analyses separately. An example PBS script is provided to demonstrate how to do this.

PBS Script Examples

See the Metabin_PBS page for metabin PBS script examples.


Citation

If you publish research that uses metabin you have to cite it as follows: Sharma, V.K., Kumar, N., Prakash, T., Taylor, T.D., 2012. Fast and Accurate Taxonomic Assignments of Metagenomic Sequences Using MetaBin. PLoS One 7.