Difference between revisions of "Kraken"

Revision as of 14:03, 13 June 2022

Description

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.

Environment Modules

Run module spider kraken to find out what environment modules are available for this application.

System Variables

HPC_KRAKEN_DIR - installation directory
HPC_KRAKEN_BIN - executable directory
KRAKEN2_DB_PATH - directory with 'db' builds.

Additional Information

After loading the kraken/2 module you can check the available databases with

$ ls $KRAKEN2_DB_PATH

Use any of the standard or custom databases we host with the '-db DBNAME' argument. E.g.

$ module load kraken/2
$ kraken2-inspect -db fungi

If you need a custom database built let us know and provide the fasta file with taxonomic IDs. We'd be happy to build and host a commonly used database.

If you get the Loading database information...classify: Error reading in hash table error allocate more memory to the job. Keep incrementing the memory request until you see a status message in the job log like

Loading database information... done.
Processed 2488462 sequences (226450042 bp) ...

@@ Line 5: / Line 5: @@
 <!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
 |{{#vardefine:conf|}}           <!--CONFIGURATION-->
-|{{#vardefine:exe|}}            <!--ADDITIONAL INFO-->
+|{{#vardefine:exe|1}}            <!--ADDITIONAL INFO-->
 |{{#vardefine:pbs|}}            <!--PBS SCRIPTS-->
 |{{#vardefine:policy|}}         <!--POLICY-->
@@ Line 22: / Line 22: @@
 In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.
 <!--Modules-->
-==Required Modules==
+==Environment Modules==
-===Serial===
+Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
-* {{#var:app}}
-<!--
-===Parallel (OpenMP)===
-* intel
-* {{#var:app}}
-===Parallel (MPI)===
-* intel
-* openmpi
-* {{#var:app}}
--->
 ==System Variables==
 * HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 * HPC_{{uc:{{#var:app}}}}_BIN - executable directory
+* KRAKEN2_DB_PATH - directory with 'db' builds.
 <!--Configuration-->
 {{#if: {{#var: conf}}|==Configuration==
@@ Line 43: / Line 34: @@
 <!--Run-->
 {{#if: {{#var: exe}}|==Additional Information==
-WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
+After loading the kraken/2 module you can check the available databases with
+ $ ls $KRAKEN2_DB_PATH
+Use any of the standard or custom databases we host with the '-db DBNAME' argument. E.g.
+ $ module load kraken/2
+ $ kraken2-inspect -db fungi
+If you need a custom database [https://github.com/DerrickWood/kraken2/wiki/Manual#custom-databases built] let us know and provide the fasta file with taxonomic IDs. We'd be happy to build and host a commonly used database.
+If you get the ''Loading database information...classify: Error reading in hash table'' error allocate more memory to the job. Keep incrementing the memory request until you see a status message in the job log  like
+ Loading database information... done.
+ Processed 2488462 sequences (226450042 bp) ...
 |}}
 <!--PBS scripts-->
@@ Line 70: / Line 75: @@
 <!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 __NOTOC____NOEDITSECTION__
-=Validation=
-* Validated 4/5/2018