Khmer - python scripts for k-mer counting, filtering and graph traversal.
Available scripts: abundance-dist.py, count-median.py, do-partition.sh, filter-abund.py, find-knots.py, load-into-counting.py, merge-partitions.py, normalize-by-median.py, partition-graph.py, annotate-partitions.py, count-overlap.py, extract-partitions.py, filter-stoptags.py, load-graph.py, make-initial-stoptags.py, normalize-by-kadian.py, normalize-by-min.py
Use "import khmer" in your script or in an interactive python session.
module spider khmer to find out what environment modules are available for this application.
- HPC_KHMER_DIR - installation directory
If you use the khmer software, you must cite:
- Crusoe et al., The khmer software package: enabling efficient sequence analysis. 2014. doi: 10.6084/m9.figshare.979190
If you use any of khmer's published scientific methods, you should *also* cite the relevant paper(s), as directed below.
- Graph partitioning and/or compressible graph representation
- The load-graph.py, partition-graph.py, find-knots.py, load-graph.py, and partition-graph.py scripts are part of the compressible graph representation and partitioning algorithms described in:
- Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT
- Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7
- doi: 10.1073/pnas.1121464109
- PMID: 22847406
- Digital normalization
- The normalize-by-median.py and count-median.py scripts are part of the digital normalization algorithm, described in:
- A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
- Brown CT, Howe AC, Zhang Q, Pyrkosz AB, Brom TH
- arXiv:1203.4802 [q-bio.GN]
- K-mer counting
- The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts implement the probabilistic k-mer counting described in:
- These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
- Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT.
- arXiv:1309.2975 [q-bio.GN]