A metanenomic sample is a set of sequences of reads from microbial life living in a particular environment. Standard analysis involves estimating the species composition of the environment by aligning the reads against a reference database. Since the age of pangenomics, alignment is preferentially done against a variation graph encompassing all variation within a species.
Themisto is a space-efficient tool for indexing such variation graphs. The Themisto index is a compressed colored de-bruijn graph of order k, where each node has a set of colors representing the reference sequences that contain the k-mer corresponding to the node. Reads are pseudoaligned to the index using a method similar to the one used by the tool Kallisto: all k-mers of the read are located in the de-bruijn graph and the intersection of the color sets of the nodes is returned.
module spider themisto to find out what environment modules are available for this application.
- HPC_THEMISTO_DIR - installation directory
- HPC_THEMISTO_BIN - executable directory
If you publish research that uses themisto you have to cite it as follows:
Tommi Mäklin, Teemu Kallonen, Jarno Alanko, Veli Mäkinen, Jukka Corander, Antti Honkela. Genomic Epidemiology with Mixed Samples. Supplement: Pseudoalignment in the mGEMS pipeline.