Difference between revisions of "Picard-tools"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:SoftwareCategory:biologyCategory:bioinformaticsCategory:ngs {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|picard}} |{{#vardefine:url|http://broad...")
 
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Software]][[Category:biology]][[Category:bioinformatics]][[Category:ngs]]
+
__NOTOC__
{|<!--CONFIGURATION: REQUIRED-->
+
__NOEDITSECTION__
 +
[[Category:Software]][[Category:Biology]][[Category:NGS]]
 +
{|<!--Main settings - REQUIRED-->
 
|{{#vardefine:app|picard}}
 
|{{#vardefine:app|picard}}
|{{#vardefine:url|http://broadinstitute.github.io/picard/}}
+
|{{#vardefine:url|http://picard.sourceforge.net/}}
<!--CONFIGURATION: OPTIONAL (|1}} means it's ON)-->
+
|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
|{{#vardefine:conf|}}           <!--CONFIGURATION-->
+
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
|{{#vardefine:exe|}}           <!--ADDITIONAL INFO-->
+
|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
|{{#vardefine:job|}}           <!--JOB SCRIPTS-->
+
|{{#vardefine:policy|}} <!--Enable policy section -->
|{{#vardefine:policy|}}         <!--POLICY-->
+
|{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
|{{#vardefine:testing|}}       <!--PROFILING-->
+
|{{#vardefine:faq|}} <!--Enable FAQ section -->
|{{#vardefine:faq|}}             <!--FAQ-->
+
|{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
|{{#vardefine:citation|}}       <!--CITATION-->
 
|{{#vardefine:installation|}} <!--INSTALLATION-->
 
 
|}
 
|}
<!--BODY-->
+
<!-- ########  Template Body ######## -->
 
<!--Description-->
 
<!--Description-->
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF specification.
+
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM is described in the SAMtools project page.
  
 +
Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.
 +
 +
Picard does not have its own mailing lists. Please use the SAMTools mailing Lists for Picard-related correspondence.
 
<!--Modules-->
 
<!--Modules-->
 
==Environment Modules==
 
==Environment Modules==
 
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
 
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
 
==System Variables==
 
==System Variables==
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
<!--Configuration-->
+
<!--Additional-->
{{#if: {{#var: conf}}|==Configuration==
 
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.
 
|}}
 
<!--Run-->
 
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
 +
See the [[Samtools|samtools HPC wiki page]] for more details on Samtools.
 +
 +
There is a convenience symlink to the latest picard directory in the addition to the <code>$HPC_PICARD_DIR</code> set by the picard module as
 +
/apps/picard/bin
 +
So picard jars can be run as
 +
java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/picard.jar <command> <options>
  
WRITE_ADDITIONAL_INSTRUCTIONS_ON_RUNNING_THE_SOFTWARE_IF_NECESSARY
+
To run picard tools with non-default java memory settings you can set the minimum and the maximum heap memory in the command. For example:
  
|}}
+
java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/picard.jar ReorderSam INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT REFERENCE=ref.fasta CREATE_INDEX=true
<!--Job Scripts-->
+
 
{{#if: {{#var: job}}|==Job Script Examples==
+
Starting with picard/1.137 there is a convenient shell wrapper '<code>picard</code>', so you can use the following command:
See the [[{{PAGENAME}}_Job_Scripts]] page for {{#var: app}} Job script examples.
+
 
|}}
+
picard CreateSequenceDictionary R=example.fa O=example.dict
<!--Policy-->
+
 
{{#if: {{#var: policy}}|==Usage Policy==
+
==Picard Java Garbage Collection thread use==
 +
As a Java-based application, Picard uses extra threads for Garbage Collection (GC--memory management). We have observed that these threads can use a significant percentage of CPU resources, bringing Picard's total CPU utilization to as high as 6 CPUs. This issue is discussed in this [http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page#Q:_Why_does_a_Picard_program_use_so_many_threads.3F Picard FAQ]. In a scheduled environment it is critical that all resources used by the application are requested at the time of submission. Please make sure to account for this in formulating the job resource request, as well as using Java's GC management flag (-XX:ParallelGCThreads=<number of threads> ) to appropriately limit GC. As a start, we recommend a resource request and GC flag along the lines of:
 +
*<pre>#PBS -l nodes=1:ppn=6</pre>
 +
*<pre>java -Xms1g -Xmx8g -XX:ParallelGCThreads=5 -jar $HPC_PICARD_DIR/picard.jar <command> <options></pre>
 +
 
 +
Resource use during a job can be monitored with the showq -r -u <username>. The "EFFIC" column shows job efficiency. Numbers over 100 indicate your job is using more resources than requested. If efficiency is consistently well below 100, you can ramp back the above resource and -XX:ParallelGCThreads requests.
 +
 
 +
==Disabling Java Garbage Collector Overhead Limit==
 +
 
 +
Java garbage collector produces a '<code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>' exception when it spends too much time in garbage collection without making much progress. For example, if over 98% of processor time is spent on Garbage Collection and less than 2% of heap is recovered.
 +
 
 +
This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. See [http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#par_gc.oom] for reference.
  
WRITE USAGE POLICY HERE (Licensing, usage, access).
+
You can remove the Garbage Collector limit with the following argument '<code>-XX:-UseGCOverheadLimit</code>'.
  
|}}
+
==Using an Alternate Collector==
<!--Performance-->
 
{{#if: {{#var: testing}}|==Performance==
 
  
WRITE_PERFORMANCE_TESTING_RESULTS_HERE
+
For some workloads the default java garbage collector could lead to execution errors. A possible workaround is to use an alternate garbage collector e.g. the concurrent low pause collector with
  
 +
java -Xms1g -Xmx8g -XX:+UseConcMarkSweepGC -jar $HPC_PICARD_DIR/picard.jar <command> <options>
 +
==TMP Directory and Out of Space Error==
 +
If you encounter a 'disk space exceeded' error it means that the Picard command you are trying to run has a temporary directory setting, which defaults to /tmp. On HiPerGator2 nodes there is no local disk, so a large analysis cannot use /tmp. Add the following argument to the command line to avoid the error:
 +
TMP_DIR="$(pwd)/tmp"
 
|}}
 
|}}
<!--Faq-->
+
{{#if: {{#var: conf}}|==Configuration==
 +
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
 +
{{#if: {{#var: pbs}}|==PBS Script Examples==
 +
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
 +
{{#if: {{#var: policy}}|==Usage Policy==
 +
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
 +
{{#if: {{#var: testing}}|==Performance==
 +
WRITE PERFORMANCE TESTING RESULTS HERE|}}
 
{{#if: {{#var: faq}}|==FAQ==
 
{{#if: {{#var: faq}}|==FAQ==
 
*'''Q:''' **'''A:'''|}}
 
*'''Q:''' **'''A:'''|}}
<!--Citation-->
 
 
{{#if: {{#var: citation}}|==Citation==
 
{{#if: {{#var: citation}}|==Citation==
If you publish research that uses {{#var:app}} you have to cite it as follows:
+
If you publish research that uses {{{app}}} you have to cite it as follows:
 
+
WRITE CITATION HERE
WRITE_CITATION_HERE
 
 
 
 
|}}
 
|}}
<!--Installation-->
 
{{#if: {{#var: installation}}|==Installation==
 
See the [[{{PAGENAME}}_Install]] page for {{#var: app}} installation notes.|}}
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
__NOTOC____NOEDITSECTION__
 

Latest revision as of 15:25, 27 January 2023

Description

picard website  

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM is described in the SAMtools project page.

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

Picard does not have its own mailing lists. Please use the SAMTools mailing Lists for Picard-related correspondence.

Environment Modules

Run module spider picard to find out what environment modules are available for this application.

System Variables

  • HPC_PICARD_DIR - installation directory

Additional Information

See the samtools HPC wiki page for more details on Samtools.

There is a convenience symlink to the latest picard directory in the addition to the $HPC_PICARD_DIR set by the picard module as

/apps/picard/bin

So picard jars can be run as

java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/picard.jar <command> <options>

To run picard tools with non-default java memory settings you can set the minimum and the maximum heap memory in the command. For example:

java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/picard.jar ReorderSam INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT REFERENCE=ref.fasta CREATE_INDEX=true

Starting with picard/1.137 there is a convenient shell wrapper 'picard', so you can use the following command:

picard CreateSequenceDictionary R=example.fa O=example.dict

Picard Java Garbage Collection thread use

As a Java-based application, Picard uses extra threads for Garbage Collection (GC--memory management). We have observed that these threads can use a significant percentage of CPU resources, bringing Picard's total CPU utilization to as high as 6 CPUs. This issue is discussed in this Picard FAQ. In a scheduled environment it is critical that all resources used by the application are requested at the time of submission. Please make sure to account for this in formulating the job resource request, as well as using Java's GC management flag (-XX:ParallelGCThreads=<number of threads> ) to appropriately limit GC. As a start, we recommend a resource request and GC flag along the lines of:

  • #PBS -l nodes=1:ppn=6
  • java -Xms1g -Xmx8g -XX:ParallelGCThreads=5 -jar $HPC_PICARD_DIR/picard.jar <command> <options>

Resource use during a job can be monitored with the showq -r -u <username>. The "EFFIC" column shows job efficiency. Numbers over 100 indicate your job is using more resources than requested. If efficiency is consistently well below 100, you can ramp back the above resource and -XX:ParallelGCThreads requests.

Disabling Java Garbage Collector Overhead Limit

Java garbage collector produces a 'java.lang.OutOfMemoryError: GC overhead limit exceeded' exception when it spends too much time in garbage collection without making much progress. For example, if over 98% of processor time is spent on Garbage Collection and less than 2% of heap is recovered.

This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. See [1] for reference.

You can remove the Garbage Collector limit with the following argument '-XX:-UseGCOverheadLimit'.

Using an Alternate Collector

For some workloads the default java garbage collector could lead to execution errors. A possible workaround is to use an alternate garbage collector e.g. the concurrent low pause collector with

java -Xms1g -Xmx8g -XX:+UseConcMarkSweepGC -jar $HPC_PICARD_DIR/picard.jar <command> <options>

TMP Directory and Out of Space Error

If you encounter a 'disk space exceeded' error it means that the Picard command you are trying to run has a temporary directory setting, which defaults to /tmp. On HiPerGator2 nodes there is no local disk, so a large analysis cannot use /tmp. Add the following argument to the command line to avoid the error:

TMP_DIR="$(pwd)/tmp"