Appreci8

From UFRC
Revision as of 14:25, 12 August 2022 by Israel.herrera (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Description

Appreci8 website  

Appreci8 is a variant calling pipeline for detecting single nucleotide variants (SNVs) and short indels (up to 30 bp) in next-generation sequencing (NGS) data. By integrating and filtering the output of eight individual variant calling tools on the basis of an artifact- and a polymorphism score, appreci8 succeeds in calling variants with high sensitivity and positive predictive value even at variant allele frequencies of 1 percent.

Environment Modules

Run module spider Appreci8 to find out what environment modules are available for this application.

System Variables

  • HPC_APPRECI8_DIR - installation directory
  • HPC_APPRECI8_BIN - executable directory

Additional Information

HiPerGator provides a "wrapper" to call the underlying container image. An example of the proper way to call the program would look something like:

  $ module load appreci8/20180530
  $ appreci8 /path/to/your/data/

where /path/to/you/data/ is a directory set up as follows (from the description at https://hub.docker.com/r/wwuimi/appreci8/)

The data you wish to analyze has to be prepared in the following way (compare folder Example contained in the appreci8 folder):
   SampleNames.txt: The names of the samples you wish to analyze (without file extension, one name per line)
   vcf_header.txt: Standard vcf file header (available in the appreci8 folder)
   Folder alignment: Containing the bam- and bai files of the samples you wish to analyze (format: sample1.bam, sample1.bai etc.)
   Folder snpEff_ann:
       Hotspots.txt: A list containing known hotspot mutations, covering Gene, Mutation (change on amino acid level, one-letter-code), Min_VAF (minimum allelic frequency at which you expect these mutations); an empty list can be passed, containing the header and three NA's (available in the appreci8 folder)
       transcripts.txt: A list containing the genes and the corresponding Ensembl transcript-IDs to be analyzed (without header; e.g. NRAS\tab ENST00000369535; for an example see file in the Example folder)
   Folder targetRegions:
       targetRegions.bed: Bed file containing the target regions to be analyzed (no header, no information except for chr, start, end; 1 instead of chr1 etc.; for an example see file in the Example folder)

An example of how to set up the analysis data folder can be found here