Difference between revisions of "Khmer"

Revision as of 13:53, 15 August 2014

Description

Khmer - python scripts for k-mer counting, filtering and graph traversal.

There's a khmer mailing list at librelist.com that you can use to get help with khmer. To sign up, email 'khmer@librelist.com' to subscribe; then send your question/comment there.

IMPORTANT NOTE:

khmer is *pre-publication* and *research* software, so please keep in mind that (a) the code may have undiscovered bugs in it, (b) you should cite us, and (c) you should get in touch if you need to cite us, as we are writing up the project.

Available scripts: abundance-dist.py, count-median.py, do-partition.sh, filter-abund.py, find-knots.py, load-into-counting.py, merge-partitions.py, normalize-by-median.py, partition-graph.py, annotate-partitions.py, count-overlap.py, extract-partitions.py, filter-stoptags.py, load-graph.py, make-initial-stoptags.py, normalize-by-kadian.py, normalize-by-min.py

Use "import khmer" in your script or in an interactive python session.

Required Modules

modules documentation

Serial

khmer

System Variables

HPC_{{#uppercase:khmer}}_DIR
HPC_KHMER_BIN
HPC_KHMER_LIB

Citation

If you use the khmer software, you must cite:

Crusoe et al., The khmer software package: enabling efficient sequence analysis. 2014. doi: 10.6084/m9.figshare.979190

If you use any of our published scientific methods, you should *also* cite the relevant paper(s), as directed below.

Graph partitioning and/or compressible graph representation

The load-graph.py, partition-graph.py, find-knots.py, load-graph.py, and partition-graph.py scripts are part of the compressible graph representation and partitioning algorithms described in:

Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT. Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7. doi: 10.1073/pnas.1121464109. PMID: 22847406

Digital normalization

The normalize-by-median.py and count-median.py scripts are part of the digital normalization algorithm, described in:

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

Brown CT, Howe AC, Zhang Q, Pyrkosz AB, Brom TH

arXiv:1203.4802 [q-bio.GN]

http://arxiv.org/abs/1203.4802

K-mer counting

The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts implement the probabilistic k-mer counting described in:

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT.

arXiv:1309.2975 [q-bio.GN]

http://arxiv.org/abs/1309.2975

@@ Line 4: / Line 4: @@
 {|<!--Main settings - REQUIRED-->
 |{{#vardefine:app|khmer}}
-|{{#vardefine:url|https://github.com/ctb/khmer}}
+|{{#vardefine:url|https://github.com/ged-lab/khmer}}
 |{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 |{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
@@ Line 11: / Line 11: @@
 |{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
 |{{#vardefine:faq|}} <!--Enable FAQ section -->
-|{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
+|{{#vardefine:citation|1}} <!--Enable Reference/Citation section -->
 |}
 <!-- ########  Template Body ######## -->
@@ Line 51: / Line 51: @@
 *'''Q:''' **'''A:'''|}}
 {{#if: {{#var: citation}}|==Citation==
-If you publish research that uses {{{app}}} you have to cite it as follows:
+If you use the khmer software, you must cite:
-WRITE CITATION HERE
+: Crusoe et al., The khmer software package: enabling efficient sequence analysis. 2014. doi: 10.6084/m9.figshare.979190
+If you use any of our published scientific methods, you should *also* cite the relevant paper(s), as directed below.
+* Graph partitioning and/or compressible graph representation
+: The load-graph.py, partition-graph.py, find-knots.py, load-graph.py, and partition-graph.py scripts are part of the compressible graph representation and partitioning algorithms described in:
+:: Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT. Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7. doi: 10.1073/pnas.1121464109. PMID: 22847406
+* Digital normalization
+: The normalize-by-median.py and count-median.py scripts are part of the digital normalization algorithm, described in:
+:: A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data
+:: Brown CT, Howe AC, Zhang Q, Pyrkosz AB, Brom TH
+:: arXiv:1203.4802 [q-bio.GN]
+:: http://arxiv.org/abs/1203.4802
+* K-mer counting
+: The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts implement the probabilistic k-mer counting described in:
+:: These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
+:: Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT.
+:: arXiv:1309.2975 [q-bio.GN]
+:: http://arxiv.org/abs/1309.2975
 |}}