Difference between revisions of "Picard"

From UFRC
Jump to navigation Jump to search
Line 36: Line 36:
 
There is a convenience symlink to the latest picard directory in the addition to the <code>$HPC_PICARD_DIR</code> set by the picard module as
 
There is a convenience symlink to the latest picard directory in the addition to the <code>$HPC_PICARD_DIR</code> set by the picard module as
 
  /apps/picard/bin
 
  /apps/picard/bin
So picard jars can be run both as
+
So picard jars can be run as
java -Xms1g -Xmx8g -jar /apps/picard/bin/ReorderSam.jar
 
and
 
 
  java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/ReorderSam.jar
 
  java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/ReorderSam.jar
  
Line 45: Line 43:
 
<code>java -Xms1g -Xmx8g -jar /apps/picard/bin/ReorderSam.jar INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT REFERENCE=ref.fasta CREATE_INDEX=true</code>
 
<code>java -Xms1g -Xmx8g -jar /apps/picard/bin/ReorderSam.jar INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT REFERENCE=ref.fasta CREATE_INDEX=true</code>
  
Starting with picard/1.137 use the following:
+
Starting with picard/1.137 there is a convenient shell wrapper '<code>picard</code>', so you can use the following command:
  
 
<code>picard CreateSequenceDictionary R=example.fa O=example.dict</code>
 
<code>picard CreateSequenceDictionary R=example.fa O=example.dict</code>
  
===Picard Java Garbage Collection thread use===
+
==Picard Java Garbage Collection thread use==
 
As a Java-based application, Picard uses extra threads for Garbage Collection (GC--memory management). We have observed that these threads can use a significant percentage of CPU resources, bringing Picard's total CPU utilization to as high as 6 CPUs. This issue is discussed in this [http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page#Q:_Why_does_a_Picard_program_use_so_many_threads.3F Picard FAQ]. In a scheduled environment it is critical that all resources used by the application are requested at the time of submission. Please make sure to account for this in formulating the job resource request, as well as using Java's GC management flag (-XX:ParallelGCThreads=<number of threads> ) to appropriately limit GC. As a start, we recommend a resource request and GC flag along the lines of:  
 
As a Java-based application, Picard uses extra threads for Garbage Collection (GC--memory management). We have observed that these threads can use a significant percentage of CPU resources, bringing Picard's total CPU utilization to as high as 6 CPUs. This issue is discussed in this [http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page#Q:_Why_does_a_Picard_program_use_so_many_threads.3F Picard FAQ]. In a scheduled environment it is critical that all resources used by the application are requested at the time of submission. Please make sure to account for this in formulating the job resource request, as well as using Java's GC management flag (-XX:ParallelGCThreads=<number of threads> ) to appropriately limit GC. As a start, we recommend a resource request and GC flag along the lines of:  
 
  #PBS -l nodes=1:ppn=6
 
  #PBS -l nodes=1:ppn=6
Line 57: Line 55:
 
Resource use during a job can be monitored with the showq -r -u <username>. The "EFFIC" column shows job efficiency. Numbers over 100 indicate your job is using more resources than requested. If efficiency is consistently well below 100, you can ramp back the above resource and -XX:ParallelGCThreads requests.  
 
Resource use during a job can be monitored with the showq -r -u <username>. The "EFFIC" column shows job efficiency. Numbers over 100 indicate your job is using more resources than requested. If efficiency is consistently well below 100, you can ramp back the above resource and -XX:ParallelGCThreads requests.  
  
 
+
==TMP Directory and Out of Space Error==
 +
If you encounter a 'disk space exceeded' error it means that the Picard command you are trying to run has a temporary directory setting, which defaults to /tmp. On HiPerGator2 nodes there is no local disk, so a large analysis cannot use /tmp. Add the following argument to the command line to avoid the error:
 +
TMP_DIR="$(pwd)/tmp"
 
|}}
 
|}}
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==

Revision as of 20:29, 10 July 2016

Description

picard website  

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM is described in the SAMtools project page.

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

Picard does not have its own mailing lists. Please use the SAMTools mailing Lists for Picard-related correspondence.

Required Modules

modules documentation

Serial

  • picard

System Variables

  • HPC_{{#uppercase:picard}}_DIR - installation directory

Additional Information

See the samtools HPC wiki page for more details on Samtools.

There is a convenience symlink to the latest picard directory in the addition to the $HPC_PICARD_DIR set by the picard module as

/apps/picard/bin

So picard jars can be run as

java -Xms1g -Xmx8g -jar $HPC_PICARD_DIR/ReorderSam.jar

To run picard tools with non-default java memory settings you can set the minimum and the maximum heap memory in the command. For example:

java -Xms1g -Xmx8g -jar /apps/picard/bin/ReorderSam.jar INPUT=input.bam OUTPUT=output.bam VALIDATION_STRINGENCY=LENIENT REFERENCE=ref.fasta CREATE_INDEX=true

Starting with picard/1.137 there is a convenient shell wrapper 'picard', so you can use the following command:

picard CreateSequenceDictionary R=example.fa O=example.dict

Picard Java Garbage Collection thread use

As a Java-based application, Picard uses extra threads for Garbage Collection (GC--memory management). We have observed that these threads can use a significant percentage of CPU resources, bringing Picard's total CPU utilization to as high as 6 CPUs. This issue is discussed in this Picard FAQ. In a scheduled environment it is critical that all resources used by the application are requested at the time of submission. Please make sure to account for this in formulating the job resource request, as well as using Java's GC management flag (-XX:ParallelGCThreads=<number of threads> ) to appropriately limit GC. As a start, we recommend a resource request and GC flag along the lines of:

#PBS -l nodes=1:ppn=6

and

java -Xms1g -Xmx8g -XX:ParallelGCThreads=5 -jar $HPC_PICARD_DIR/ReorderSam.jar

Resource use during a job can be monitored with the showq -r -u <username>. The "EFFIC" column shows job efficiency. Numbers over 100 indicate your job is using more resources than requested. If efficiency is consistently well below 100, you can ramp back the above resource and -XX:ParallelGCThreads requests.

TMP Directory and Out of Space Error

If you encounter a 'disk space exceeded' error it means that the Picard command you are trying to run has a temporary directory setting, which defaults to /tmp. On HiPerGator2 nodes there is no local disk, so a large analysis cannot use /tmp. Add the following argument to the command line to avoid the error:

TMP_DIR="$(pwd)/tmp"