MetaBin: a program for accurate, fast and highly sensitive taxonomic assignments of metagenomic sequences For comprehensive taxonomic binning, we developed the ‘MetaBin’ web server and standalone program for faster and more accurate taxonomic assignment of single and paired-end sequence reads of varying lengths (≥45 bp) obtained from both Sanger and next-generation sequencing platforms. We benchmarked it using both simulated reads (> 1 million) and real metagenomic datasets. MetaBin correctly assigns a higher number of reads to their expected taxonomic lineages with a lower error frequency as compared to other methods. It displays high accuracy (positive predictive value (PPV) ≥99%) along with high sensitivity (≥94%) for various read lengths. In particular, for short Illumina reads (~45-75 bp) it makes about 4% more assignments as compared to its closest competitors with near 100% accuracy when reference genomes are available.
By implementing Blat a faster alignment method as opposed to Blastx (though both options are available), the analysis time is reduced by 50-1000 times, which is comparable or faster than the time taken for analysis by usually faster composition-based methods. This feature makes it practical to use a more accurate and sensitive homology-based approach for high-throughput analysis of large datasets by removing the bottleneck of time required to generate alignments using Blastx. The MetaBin web server allows users to upload their own data, as sequence reads or Blastx output, to carry out taxonomic analysis. It provides several visualization options for constructing a taxonomic tree of the results, and for performing comparative analysis of the taxonomic profiles for multiple metagenomic datasets.
The standalone command line version is installed.
Metabin uses Jim Kent's Blat application as an alignment method that is much faster than Blastx. Unfortunately, Blat has a bug which causes it to crash with large reference databases, like nr (see the Blat wiki page). As a workaround, we suggest dividing large reference databases into multiple files. To implement this, it is necessary to run the prepareinput step of Metabin with the "-b n" option, and run the Blat analyses separately. An example PBS script is provided to demonstrate how to do this.
If you publish research that uses metabin you have to cite it as follows: Sharma, V.K., Kumar, N., Prakash, T., Taylor, T.D., 2012. Fast and Accurate Taxonomic Assignments of Metagenomic Sequences Using MetaBin. PLoS One 7.