Difference between revisions of "GSNP"

From UFRC
Jump to navigation Jump to search
m (Text replacement - "#uppercase" to "uc")
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:Software]][[Category:Bioinformatics]][[Category:NGS]][[Category:SNP]]
+
[[Category:Software]][[Category:Biology]][[Category:NGS]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|gsnp}}
 
|{{#vardefine:app|gsnp}}
Line 26: Line 26:
 
With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work.
 
With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work.
 
<!--Modules-->
 
<!--Modules-->
==Required Modules==
+
==Environment Modules==
===Serial===
+
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
* {{#var:app}}
 
<!--
 
===Parallel (OpenMP)===
 
* intel
 
* {{#var:app}}
 
===Parallel (MPI)===
 
* intel
 
* openmpi
 
* {{#var:app}}
 
-->
 
 
==System Variables==
 
==System Variables==
* HPC_{{uc:{{#var:app}}}}_DIR
+
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 
<!--Configuration-->
 
<!--Configuration-->
 
{{#if: {{#var: conf}}|==Configuration==
 
{{#if: {{#var: conf}}|==Configuration==
Line 73: Line 63:
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
<!--Turn the Table of Contents and Edit paragraph links ON/OFF-->
 
__NOTOC____NOEDITSECTION__
 
__NOTOC____NOEDITSECTION__
=Validation=
 
* Validated 4/5/2018
 

Latest revision as of 17:27, 15 August 2022

Description

gsnp website  

As the development of the second generation sequencing technology (NGS), research about the genetic variation can be realized by sequencing about the whole genome of an individual or re-sequencing about the target area. Single nucleotide polymorphism (SNP) is the most common form of genetic variation. SNP detection is to find a new polymorphism site and the known polymorphism alleles on the target area.

There is lots of SNP detection software working on NGS data, among which is the widely used SOAPsnp. It takes into account the quality of sequencing data and errors of alignment and experiment to use a Bayesian model based SNP detection algorithm for calculation of quality score of each base. These quality scores are used as the standard of consensus sequence calling. Combined with the prior probability of dbSNP allele, it gets a low error rate for low-depth sequencing.

GSNP is an implementation of SOAPsnp on GPU. It uses one GPU thread to process one independent site, and optimizes the program in two ways: 1. Use sparse data structure to store the aligned base so as to reduce memory overhead; 2. Develop customized compression algorithms to reduce I/O overhead.

With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work.

Environment Modules

Run module spider gsnp to find out what environment modules are available for this application.

System Variables

  • HPC_GSNP_DIR - installation directory