Jump to navigation Jump to search


SHAPEIT4 website  

SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm with multiple key additional features:

It includes a Positional Burrow Wheeler Transform (PBWT) based approach to quickly select a small set of informative conditioning haplotypes to be used when updating the phase of an individual. We have changed that way in which phase information in sequencing reads is input into the model. We now recommend the use of the WhatsHap tool as a pre-processing step to extract phase information from a bam file. It accounts for sets of pre-phased genotypes (i.e. haplotype scaffold). The scaffold can be derived either from family data or large reference panels. It reads and writes files using HTSlib for better I/O performance in either VCF or BCF formats. The genotype graph and HMM routines have been re-implemented for better hardware usage and performance. The source code is provided in an open source format (licence MIT) on github.

Environment Modules

Run module spider SHAPEIT4 to find out what environment modules are available for this application.

System Variables

  • HPC_SHAPEIT4_DIR - installation directory

Job Script Examples

Expand this section to view example

#SBATCH --job-name=shapeit4_test
#SBATCH --mail-type=NONE
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=shapeit4_test.log

echo "Setting up test environment..."

cd ${TEST_PWD}
module load shapeit4

# Remove any previous test results, create a working directory, and copy
# initial test reads into the expected position in working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi

echo "Starting test run at $(date) on $(hostname)..."

shapeit4 \
    --input ${TEST_SAMPLEDIR}/unphased.vcf.gz \
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
    --region 20 \
    --output ${TEST_WORKDIR}/phased.vcf.gz \
    --thread ${SLURM_CPUS_PER_TASK:-1}

# Test with BDF files...
shapeit4 \
    --input ${TEST_SAMPLEDIR}/unphased.bcf \
    --map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
    --region 20 \
    --output ${TEST_WORKDIR}/phased.bcf \
    --thread ${SLURM_CPUS_PER_TASK:-1}

# There should be some files in the work directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR} -type f ! -empty -ls

echo "Test complete at $(date)."


If you publish research that uses SHAPEIT4 you have to cite it as follows:

Olivier Delaneau, Jean-Francois Zagury, Matthew R Robinson, Jonathan Marchini, Emmanouil Dermitzakis. Accurate, scalable and integrative haplotype estimation. Nat. Comm. 2019.