SHAPEIT4
Description
SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm with multiple key additional features:
It includes a Positional Burrow Wheeler Transform (PBWT) based approach to quickly select a small set of informative conditioning haplotypes to be used when updating the phase of an individual. We have changed that way in which phase information in sequencing reads is input into the model. We now recommend the use of the WhatsHap tool as a pre-processing step to extract phase information from a bam file. It accounts for sets of pre-phased genotypes (i.e. haplotype scaffold). The scaffold can be derived either from family data or large reference panels. It reads and writes files using HTSlib for better I/O performance in either VCF or BCF formats. The genotype graph and HMM routines have been re-implemented for better hardware usage and performance. The source code is provided in an open source format (licence MIT) on github.
Environment Modules
Run module spider SHAPEIT4
to find out what environment modules are available for this application.
System Variables
- HPC_SHAPEIT4_DIR - installation directory
Job Script Examples
Expand this section to view example
#!/bin/bash #SBATCH --job-name=shapeit4_test #SBATCH --mail-type=NONE #SBATCH --ntasks=1 #SBATCH --cpus-per-task=32 #SBATCH --mem-per-cpu=4gb #SBATCH --time=24:00:00 #SBATCH --output=shapeit4_test.log echo "Setting up test environment..." TEST_PWD=/data/apps/tests/shapeit4 TEST_SAMPLEDIR=${TEST_PWD}/example_data TEST_WORKDIR=${TEST_PWD}/output cd ${TEST_PWD} module load shapeit4 # Remove any previous test results, create a working directory, and copy # initial test reads into the expected position in working directory if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi mkdir ${TEST_WORKDIR} echo "Starting test run at $(date) on $(hostname)..." shapeit4 \ --input ${TEST_SAMPLEDIR}/unphased.vcf.gz \ --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \ --region 20 \ --output ${TEST_WORKDIR}/phased.vcf.gz \ --thread ${SLURM_CPUS_PER_TASK:-1} # Test with BDF files... shapeit4 \ --input ${TEST_SAMPLEDIR}/unphased.bcf \ --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \ --region 20 \ --output ${TEST_WORKDIR}/phased.bcf \ --thread ${SLURM_CPUS_PER_TASK:-1} # There should be some files in the work directory echo "There should be some results listed below:" find ${TEST_WORKDIR} -type f ! -empty -ls echo "Test complete at $(date)."
Citation
If you publish research that uses SHAPEIT4 you have to cite it as follows:
Olivier Delaneau, Jean-Francois Zagury, Matthew R Robinson, Jonathan Marchini, Emmanouil Dermitzakis. Accurate, scalable and integrative haplotype estimation. Nat. Comm. 2019.