Difference between revisions of "GSNP"
Moskalenko (talk | contribs) m (Text replacement - "#uppercase" to "uc") |
|||
Line 26: | Line 26: | ||
With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work. | With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work. | ||
<!--Modules--> | <!--Modules--> | ||
− | == | + | ==Environment Modules== |
− | + | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | |
− | |||
− | < | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==System Variables== | ==System Variables== | ||
− | * HPC_{{uc:{{#var:app}}}}_DIR | + | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory |
<!--Configuration--> | <!--Configuration--> | ||
{{#if: {{#var: conf}}|==Configuration== | {{#if: {{#var: conf}}|==Configuration== | ||
Line 73: | Line 63: | ||
<!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | <!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | ||
__NOTOC____NOEDITSECTION__ | __NOTOC____NOEDITSECTION__ | ||
− | |||
− |
Revision as of 14:01, 13 June 2022
Description
As the development of the second generation sequencing technology (NGS), research about the genetic variation can be realized by sequencing about the whole genome of an individual or re-sequencing about the target area. Single nucleotide polymorphism (SNP) is the most common form of genetic variation. SNP detection is to find a new polymorphism site and the known polymorphism alleles on the target area.
There is lots of SNP detection software working on NGS data, among which is the widely used SOAPsnp. It takes into account the quality of sequencing data and errors of alignment and experiment to use a Bayesian model based SNP detection algorithm for calculation of quality score of each base. These quality scores are used as the standard of consensus sequence calling. Combined with the prior probability of dbSNP allele, it gets a low error rate for low-depth sequencing.
GSNP is an implementation of SOAPsnp on GPU. It uses one GPU thread to process one independent site, and optimizes the program in two ways: 1. Use sparse data structure to store the aligned base so as to reduce memory overhead; 2. Develop customized compression algorithms to reduce I/O overhead.
With these optimizations, GSNP can get more than 40X speedup compared with SOAPsnp. Originally, SOAPsnp needs 3 days to process the whole human genome while GSNP only uses 2 hours to complete the same work.
Environment Modules
Run module spider gsnp
to find out what environment modules are available for this application.
System Variables
- HPC_GSNP_DIR - installation directory