Difference between revisions of "SHAPEIT4"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:Software Category:Biology Category:Sequencing Category:Phasing {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|SHAPEIT4}} |{{#vardefine:url|https:/...")
 
 
(One intermediate revision by the same user not shown)
Line 2: Line 2:
 
[[Category:Biology]]
 
[[Category:Biology]]
 
[[Category:Sequencing]]
 
[[Category:Sequencing]]
[[Category:Phasing]]
 
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|SHAPEIT4}}
 
|{{#vardefine:app|SHAPEIT4}}
Line 48: Line 47:
 
<!--Job Scripts-->
 
<!--Job Scripts-->
 
{{#if: {{#var: job}}|==Job Script Examples==
 
{{#if: {{#var: job}}|==Job Script Examples==
See the [[{{PAGENAME}}_Job_Scripts]] page for {{#var: app}} Job script examples.
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand this section to view example''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=shapeit4_test
 +
#SBATCH --mail-type=NONE
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=32
 +
#SBATCH --mem-per-cpu=4gb
 +
#SBATCH --time=24:00:00
 +
#SBATCH --output=shapeit4_test.log
 +
 
 +
echo "Setting up test environment..."
 +
TEST_PWD=/data/apps/tests/shapeit4
 +
TEST_SAMPLEDIR=${TEST_PWD}/example_data
 +
TEST_WORKDIR=${TEST_PWD}/output
 +
 
 +
cd ${TEST_PWD}
 +
module load shapeit4
 +
 
 +
# Remove any previous test results, create a working directory, and copy
 +
# initial test reads into the expected position in working directory
 +
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
 +
mkdir ${TEST_WORKDIR}
 +
 
 +
echo "Starting test run at $(date) on $(hostname)..."
 +
 
 +
shapeit4 \
 +
    --input ${TEST_SAMPLEDIR}/unphased.vcf.gz \
 +
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
 +
    --region 20 \
 +
    --output ${TEST_WORKDIR}/phased.vcf.gz \
 +
    --thread ${SLURM_CPUS_PER_TASK:-1}
 +
 
 +
# Test with BDF files...
 +
shapeit4 \
 +
    --input ${TEST_SAMPLEDIR}/unphased.bcf \
 +
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
 +
    --region 20 \
 +
    --output ${TEST_WORKDIR}/phased.bcf \
 +
    --thread ${SLURM_CPUS_PER_TASK:-1}
 +
 
 +
# There should be some files in the work directory
 +
echo "There should be some results listed below:"
 +
find ${TEST_WORKDIR} -type f ! -empty -ls
 +
 
 +
echo "Test complete at $(date)."
 +
</pre>
 +
</div>
 +
</div>
 
|}}
 
|}}
 
<!--Policy-->
 
<!--Policy-->

Latest revision as of 19:22, 14 December 2022

Description

SHAPEIT4 website  

SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm with multiple key additional features:

It includes a Positional Burrow Wheeler Transform (PBWT) based approach to quickly select a small set of informative conditioning haplotypes to be used when updating the phase of an individual. We have changed that way in which phase information in sequencing reads is input into the model. We now recommend the use of the WhatsHap tool as a pre-processing step to extract phase information from a bam file. It accounts for sets of pre-phased genotypes (i.e. haplotype scaffold). The scaffold can be derived either from family data or large reference panels. It reads and writes files using HTSlib for better I/O performance in either VCF or BCF formats. The genotype graph and HMM routines have been re-implemented for better hardware usage and performance. The source code is provided in an open source format (licence MIT) on github.


Environment Modules

Run module spider SHAPEIT4 to find out what environment modules are available for this application.

System Variables

  • HPC_SHAPEIT4_DIR - installation directory


Job Script Examples

Expand this section to view example

#!/bin/bash
#SBATCH --job-name=shapeit4_test
#SBATCH --mail-type=NONE
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=shapeit4_test.log

echo "Setting up test environment..."
TEST_PWD=/data/apps/tests/shapeit4
TEST_SAMPLEDIR=${TEST_PWD}/example_data
TEST_WORKDIR=${TEST_PWD}/output

cd ${TEST_PWD}
module load shapeit4

# Remove any previous test results, create a working directory, and copy
# initial test reads into the expected position in working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
mkdir ${TEST_WORKDIR}

echo "Starting test run at $(date) on $(hostname)..."

shapeit4 \
    --input ${TEST_SAMPLEDIR}/unphased.vcf.gz \
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
    --region 20 \
    --output ${TEST_WORKDIR}/phased.vcf.gz \
    --thread ${SLURM_CPUS_PER_TASK:-1}

# Test with BDF files...
shapeit4 \
    --input ${TEST_SAMPLEDIR}/unphased.bcf \
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
    --region 20 \
    --output ${TEST_WORKDIR}/phased.bcf \
    --thread ${SLURM_CPUS_PER_TASK:-1}

# There should be some files in the work directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR} -type f ! -empty -ls

echo "Test complete at $(date)."


Citation

If you publish research that uses SHAPEIT4 you have to cite it as follows:

Olivier Delaneau, Jean-Francois Zagury, Matthew R Robinson, Jonathan Marchini, Emmanouil Dermitzakis. Accurate, scalable and integrative haplotype estimation. Nat. Comm. 2019.