PAUDA

From UFRC
Jump to navigation Jump to search

Description

PAUDA website  

PAUDA is a new approach toward the problem of comparing DNA reads against a database of protein reference sequences that is applicable to very large datasets consisting of hundreds of millions or billions of reads. PAUDA is an acronym for "Protein Alignment Using a DNA Aligner". The approach allows one to harness the high efficiency of DNA read aligners to compute BLASTX-like alignments between sequencing reads and a protein database in a small fraction of the time required by BLASTX. The PAUDA approach makes it possible to process DNA reads at a rate of millions of reads per CPU hour. PAUDA is 10,000 times faster than BLASTX.

Environment Modules

Run module spider PAUDA to find out what environment modules are available for this application.

System Variables

  • HPC_PAUDA_DIR - installation directory




Citation

If you publish research that uses PAUDA you have to cite it as follows:

Daniel H. Huson and Chao Xie, A poor man’s BLASTX - high-throughput metagenomic protein database search using PAUDA, submitted to HitSeq (2013).