Difference between revisions of "GATK"

From UFRC
Jump to navigation Jump to search
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
 
__NOEDITSECTION__
 
__NOEDITSECTION__
[[Category:Software]]
+
[[Category:Software]][[Category:Biology]][[Category:NGS]]
<!-- ########  Template Configuration ######## -->
+
{|<!--Main settings - REQUIRED-->
<!--Edit definitions of the variables used in template calls
 
Required variables:
 
app - lowercase name of the application e.g. "amber"
 
url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
 
Optional variables:
 
INTEL - Version of the Intel Compiler e.g. "11.1"
 
MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
 
-->
 
{|
 
<!--Main settings - REQUIRED-->
 
 
|{{#vardefine:app|gatk}}
 
|{{#vardefine:app|gatk}}
 
|{{#vardefine:url|http://www.broadinstitute.org/gsa/wiki/index.php/GATK}}
 
|{{#vardefine:url|http://www.broadinstitute.org/gsa/wiki/index.php/GATK}}
<!--Compiler and MPI settings - OPTIONAL -->
 
|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
 
|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
 
<!--Choose sections to enable - OPTIONAL-->
 
|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
 
 
|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
 
|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
 
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
 
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
Line 31: Line 16:
 
<!--Description-->
 
<!--Description-->
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
{{App_Description|app={{#var:app}}|url={{#var:url}}}}|}}
+
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 +
 
 
The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
 
The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
  
 
We aim to work well with both samtools and Picard by providing complementary tools to those available in those two packages. Our SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus, and have been pushing to make these capabilities as general-purpose and powerful as possible. My group's mandate is to ensure the success of the human medical resequencing projects we've undertaken at the Broad over the next 2-3 years, which involves providing a robust, production-quality development library that underlies tools for common analysis problems (like SNP calling) as well as enabling exploratory research on NGS data.  
 
We aim to work well with both samtools and Picard by providing complementary tools to those available in those two packages. Our SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus, and have been pushing to make these capabilities as general-purpose and powerful as possible. My group's mandate is to ensure the success of the human medical resequencing projects we've undertaken at the Broad over the next 2-3 years, which involves providing a robust, production-quality development library that underlies tools for common analysis problems (like SNP calling) as well as enabling exploratory research on NGS data.  
 
[http://www.broadinstitute.org/gsa/wiki/index.php/GATK#Using_the_GATK Upstream documentation] for {{#var:app}}.
 
[http://www.broadinstitute.org/gsa/wiki/index.php/GATK#Using_the_GATK Upstream documentation] for {{#var:app}}.
<!--Location-->
+
<!--Modules-->
{{App_Location|app={{#var:app}}|{{#var:ver}}}}
+
==Environment Modules==
==Available versions==
+
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
* 1.4.30
+
==System Variables==
* 1.6.9
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
<!-- -->
+
<!--Additional-->
{{#if: {{#var: mod}}|==Running the application using modules==
+
{{#if: {{#var: exe}}|==Additional Information==
{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
+
We provide a wrapper script GenomeAnalysisTK for gatk module versions before 4.0 that is equivalent to running
{{#if: {{#var: exe}}|==How To Run==
+
    mkdir -p tmp
We provide two wrapper scripts AnalyzeCovariates and GenomeAnalysisTK that are equivalent to running
+
    export TMPDIR=$(pwd)/tmp
java -jar $HPC_GATK_DIR/GenomeAnalysisTK.jar
+
    java -Djava.io.tmpdir=$TMPDIR -cp /apps/gatk/jexl/2.1.1/commons-jexl-2.1.1.jar -jar $HPC_GATK_DIR/GenomeAnalysisTK.jar
and
+
 
java -jar $HPC_GATK_DIR/AnalyzeCovariates.jar
+
If you do not use the wrapper you '''must''' make sure to create and use a local ''TMPDIR'' in your /blue space with GenomeAnalysisTK.jar. Otherwise /tmp will be used by default leading to filled up /tmp partitions on compute nodes and node failure.
 +
 
 +
Starting with GATK4 new upstream wrappers are available, so we no longer include our own wrapper. GenomeAnalysisTK and gatk. Running GenomeAnalysisTK will show you how to run 'gatk' tools. To get a full list of tools run 'gatk list'.
 +
 
 
|}}
 
|}}
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
Line 54: Line 43:
 
{{#if: {{#var: pbs}}|==PBS Script Examples==
 
{{#if: {{#var: pbs}}|==PBS Script Examples==
 
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
 
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
{{#if: {{#var: policy}}|==Usage policy==
+
{{#if: {{#var: policy}}|==Usage Policy==
 
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 
{{#if: {{#var: testing}}|==Performance==
 
{{#if: {{#var: testing}}|==Performance==

Latest revision as of 16:53, 15 August 2022

Description

gatk website  

The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.

We aim to work well with both samtools and Picard by providing complementary tools to those available in those two packages. Our SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus, and have been pushing to make these capabilities as general-purpose and powerful as possible. My group's mandate is to ensure the success of the human medical resequencing projects we've undertaken at the Broad over the next 2-3 years, which involves providing a robust, production-quality development library that underlies tools for common analysis problems (like SNP calling) as well as enabling exploratory research on NGS data. Upstream documentation for gatk.

Environment Modules

Run module spider gatk to find out what environment modules are available for this application.

System Variables

  • HPC_GATK_DIR - installation directory

Additional Information

We provide a wrapper script GenomeAnalysisTK for gatk module versions before 4.0 that is equivalent to running

   mkdir -p tmp
   export TMPDIR=$(pwd)/tmp
   java -Djava.io.tmpdir=$TMPDIR -cp /apps/gatk/jexl/2.1.1/commons-jexl-2.1.1.jar -jar $HPC_GATK_DIR/GenomeAnalysisTK.jar

If you do not use the wrapper you must make sure to create and use a local TMPDIR in your /blue space with GenomeAnalysisTK.jar. Otherwise /tmp will be used by default leading to filled up /tmp partitions on compute nodes and node failure.

Starting with GATK4 new upstream wrappers are available, so we no longer include our own wrapper. GenomeAnalysisTK and gatk. Running GenomeAnalysisTK will show you how to run 'gatk' tools. To get a full list of tools run 'gatk list'.