Wengan

Description

Wengan is a new genome assembler that, unlike most of the current long-reads assemblers, avoids entirely the all-vs-all read comparison. The key idea behind Wengan is that long-read alignments can be inferred by building paths on a sequence graph. To achieve this, Wengan builds a new sequence graph called the Synthetic Scaffolding Graph (SSG). The SSG is built from a spectrum of synthetic mate-pair libraries extracted from raw long-reads. Longer alignments are then built by performing a transitive reduction of the edges. Another distinct feature of Wengan is that it performs self-validation by following the read information. Wengan identifies miss-assemblies at different steps of the assembly process. For more information about the algorithmic ideas behind Wengan, please read the preprint available in bioRxiv.

Environment Modules

Run module spider wengan to find out what environment modules are available for this application.

System Variables

HPC_WENGAN_DIR - installation directory

Additional Information

NOTE: In our environment, the proper way to launch the Wengan script is by simply entering "wengan" (or "wengan.pl) followed by the parameters on the command line. The examples on the website tell you to use a path like "/wengan-v0.2-bin-Linux/wengan.pl" but that is not necessary with our set up.

For example, the command to run the WenganD demo would look like:

   $ module load wengan/0.2
   $ wengan -x ontraw -a D -s ecoli/reads/EC.50X.R1.fastq.gz,ecoli/reads/EC.50X.R2.fastq.gz -l ecoli/reads/EC.ONT.30X.fa.gz -p ec_Wd_or1 -t 10 -g 5

Citation

If you publish research that uses wengan you have to cite it as follows:

Di Genova, A., Buena-Atienza, E., Ossowski, S. and Sagot,M-F. Efficient hybrid de novo assembly of human genomes with WENGAN. Nature Biotechnology (2020), https://doi.org/10.1038/s41587-020-00747-w