Difference between revisions of "Trinity"

From UFRC
Jump to navigation Jump to search
 
(33 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
 
__NOEDITSECTION__
 
__NOEDITSECTION__
[[Category:Software]]
+
[[Category:Software]][[Category:Biology]][[Category:NGS]][[Category:Sequencing]][[Category:RNA-Seq]]
<!-- ########  Template Configuration ######## -->
+
{|<!--Main settings - REQUIRED-->
<!--Edit definitions of the variables used in template calls
 
Required variables:
 
app - lowercase name of the application e.g. "amber"
 
url - url of the software page (project, company product, etc) - e.g. "http://ambermd.org/"
 
Optional variables:
 
INTEL - Version of the Intel Compiler e.g. "11.1"
 
MPI - MPI Implementation and version e.g. "openmpi/1.3.4"
 
-->
 
{|
 
<!--Main settings - REQUIRED-->
 
 
|{{#vardefine:app|trinity}}
 
|{{#vardefine:app|trinity}}
 
|{{#vardefine:url|http://trinityrnaseq.sourceforge.net/}}
 
|{{#vardefine:url|http://trinityrnaseq.sourceforge.net/}}
<!--Compiler and MPI settings - OPTIONAL -->
+
|{{#vardefine:exe|1}} <!--Present manual instructions for running the software -->
|{{#vardefine:intel|}} <!-- E.g. "11.1" -->
 
|{{#vardefine:mpi|}} <!-- E.g. "openmpi/1.3.4" -->
 
<!--Choose sections to enable - OPTIONAL-->
 
|{{#vardefine:mod|1}} <!--Present instructions for running the software with modules -->
 
|{{#vardefine:exe|}} <!--Present manual instructions for running the software -->
 
|{{#vardefine:conf|}} <!--Enable config wiki page link - {{#vardefine:conf|1}} = ON/conf|}} = OFF-->
 
|{{#vardefine:pbs|}} <!--Enable PBS script wiki page link-->
 
|{{#vardefine:policy|}} <!--Enable policy section -->
 
|{{#vardefine:testing|}} <!--Enable performance testing/profiling section -->
 
|{{#vardefine:faq|}} <!--Enable FAQ section -->
 
|{{#vardefine:citation|}} <!--Enable Reference/Citation section -->
 
 
|}
 
|}
 
<!-- ########  Template Body ######## -->
 
<!-- ########  Template Body ######## -->
 
<!--Description-->
 
<!--Description-->
 
{{#if: {{#var: url}}|
 
{{#if: {{#var: url}}|
{{App_Description|app={{#var:app}}|url={{#var:url}}}}|}}
+
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
Trinity contains three applications - Inchworm, Chrysalis, and Butterfly.
+
 
<!--Location-->
+
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
{{App_Location|app={{#var:app}}|{{#var:ver}}}}
+
 
<!-- -->
+
Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
{{#if: {{#var: mod}}|==Running the application using modules==
+
 
{{App_Module|app={{#var:app}}|intel={{#var:intel}}|mpi={{#var:mpi}}}}|}}
+
Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
{{#if: {{#var: exe}}|==Manual execution instructions==
+
 
WRITE INSTRUCTIONS ON RUNNING THE APP WITHOUT MODULES HERE|}}
+
Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.
{{#if: {{#var: conf}}|==Configuration==
+
<!--Modules-->
See the [[{{PAGENAME}}_Configuration]] page for {{#var: app}} configuration details.|}}
+
==Environment Modules==
{{#if: {{#var: pbs}}|==PBS Script Examples==
+
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
See the [[{{PAGENAME}}_PBS]] page for {{#var: app}} PBS script examples.|}}
+
==System Variables==
{{#if: {{#var: policy}}|==Usage policy==
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
WRITE USAGE POLICY HERE (perhaps templates for a couple of main licensing schemes can be used)|}}
+
* ALLPATHSLG_BASEDIR - Allpaths-LG installation directory
{{#if: {{#var: testing}}|==Performance==
+
<!--Additional-->
WRITE PERFORMANCE TESTING RESULTS HERE|}}
+
{{#if: {{#var: exe}}|==Additional Information==
{{#if: {{#var: faq}}|==FAQ==
+
To run Trinity after you load the module use the "Trinity" command.
*'''Q:''' **'''A:'''|}}
+
 
{{#if: {{#var: citation}}|==Citation==
+
----
If you publish research that uses {{{app}}} you have to cite it as follows:
+
=General Performance=
WRITE CITATION HERE
+
See the [http://trinityrnaseq.github.io/performance/index.html Trinity Performance Page] for an overview of trinity performance and specific pages for [http://trinityrnaseq.github.io/performance/cpu.html CPU] and [http://trinityrnaseq.github.io/performance/mem.html memory] usage.
 +
 
 +
=Local Drive Use=
 +
;Caution!
 +
You must use local scratch directories on compute nodes for Trinity output and then only copy the results back to your /blue/ directory tree. Local scratch directory for a job is available via the $TMPDIR variable provided by SLURM.  Put the output directory under $TMPDIR and then copy that directory to your /blue space ''at the end of the job''. If you do not use $TMPDIR for staging out the Trinity output directory your job(s) could be cancelled without warning. See [[Temporary Directories]] for more information.
 +
----
 +
 
 +
=Java Heap Memory=
 +
If the run produces an error that states that java could not create a virtual machine due to insufficient heap memory you can set the java memory with a command that looks like
 +
export _JAVA_OPTIONS="-Xmx2g"
 +
either at the command line if doing an interactive run on a test node or in the job script. Make sure that the value in the "-Xmx" is less than the amount of memory you requested from the batch system.
 +
 
 +
The default Butterfly memory setting in the Trinity script is '-Xmx20G', so plan your job resource request accordingly.
 
|}}
 
|}}
==Installation==
 
Trinity is meant to be installed "in-tree".  To install, just download Trinity, open the archive, and type 'make'. Optional software ALLPATHS-LG - a WGS (Whole Genome Shotgun) assembler can be used with trinity if available and the ALLPATHSLG_BASEDIR environment variable is set. Currently, we do not provide ALLPATHS-LG at the UF HPC.
 

Latest revision as of 17:10, 22 August 2022

Description

trinity website  

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.

Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.

Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.

Environment Modules

Run module spider trinity to find out what environment modules are available for this application.

System Variables

  • HPC_TRINITY_DIR - installation directory
  • ALLPATHSLG_BASEDIR - Allpaths-LG installation directory

Additional Information

To run Trinity after you load the module use the "Trinity" command.


General Performance

See the Trinity Performance Page for an overview of trinity performance and specific pages for CPU and memory usage.

Local Drive Use

Caution!

You must use local scratch directories on compute nodes for Trinity output and then only copy the results back to your /blue/ directory tree. Local scratch directory for a job is available via the $TMPDIR variable provided by SLURM. Put the output directory under $TMPDIR and then copy that directory to your /blue space at the end of the job. If you do not use $TMPDIR for staging out the Trinity output directory your job(s) could be cancelled without warning. See Temporary Directories for more information.


Java Heap Memory

If the run produces an error that states that java could not create a virtual machine due to insufficient heap memory you can set the java memory with a command that looks like

export _JAVA_OPTIONS="-Xmx2g"

either at the command line if doing an interactive run on a test node or in the job script. Make sure that the value in the "-Xmx" is less than the amount of memory you requested from the batch system.

The default Butterfly memory setting in the Trinity script is '-Xmx20G', so plan your job resource request accordingly.