DBG2OLC is a proof-of-concept implementation of a novel algorithm that allows efficient assembly of long erroneous reads of mammalian size genomes on modest computational resources. The algorithm converts the de novo genome assembly problem from the de Bruijn graph to the overlap layout consensus framework, so it only needs to focus on the overlaps composed of reads that are non-contained within any contigs built with de Bruijn graph algorithm, rather than on all the overlaps in the genome data sets. For each read spanning through several contigs, it compresses the regions that lie inside each de Bruijn graph contigs, which greatly lowers the length of the reads and therefore the complexity of the assembly problem. The new algorithm transforms previously prohibitive tasks such as pair-wise alignment into jobs that can be completed within small amount of time. A compressed overlap graph that preserves all necessary information is constructed with the compressed reads to enable the final-stage assembly.
- HPC_DBG2OLC_DIR - installation directory
- HPC_DBG2OLC_DOC - included documentation directory
If you publish research that uses dbg2olc you have to cite it as follows:
DBG2OLC: Efficient Assembly of Large Genomes Using the Compressed Overlap Graph. Chengxi Ye, Chris Hill, Sergey Koren, Jue Ruan, Zhanshan (Sam)Ma, James A. Yorke, Aleksey Zimin. http://arxiv.org/abs/1410.2801
- Validated 4/5/2018