Skip to content

SHAPEIT5

SHAPEIT5 (Segmented HAPlotype Estimation and Imputation Tools) is a collection of tools that estimates haplotypes in large datasets, with a special focus on rare variants.

SHAPEIT5 is available as a module on Apocrita.

Usage

To run the default installed version of SHAPEIT5, simply load the shapeit5 module:

module load shapeit5

The binaries available as part of shapeit5 are:

ligate
phase_common
phase_rare
simulate
switch
xcftools

Run the tool you require by prefixing it with shapeit5. For example, to run phase_common:

$ shapeit5 phase_common --help

[SHAPEIT5] phase_common (jointly phase multiple common markers)
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : olivier.delaneau@gmail.com
  * Version       : 5.1.1 / commit = 990ed0d / release = 2023-05-08
  * Run date      : 18/07/2023 - 10:22:18
(etc.)

Add any arguments after the binary being run, for example:

shapeit5 ligate --input input.txt --output output.bcf --thread ${NSLOTS} --index

Core Utilisation

Use the --thread ${NSLOTS} flag as described above to ensure you use the correct number of cores for your job.

The module also has its own internal version of bcftools which you can run:

$ shapeit5 bcftools --version
bcftools 1.15
Using htslib 1.15
Copyright (C) 2022 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Bcftools Core Utilisation

Use the --threads ${NSLOTS} flag when running bcftools to ensure you use the correct number of cores for your job (see example job below).

Example job

Serial job

Here is an example job running on 8 cores and 16GB of memory based on the tutorial for Simulated WGS data. You need to clone the SHAPEIT5 repository and then create your job script in the test folder so that the paths are correct.

#!/bin/bash
#$ -cwd
#$ -pe smp 8
#$ -l h_rt=24:0:0
#$ -l h_vmem=2G
#$ -j y
#$ -N shapeit5-test

# Adapted from https://odelaneau.github.io/shapeit5/docs/tutorials/simulated/

# STEP1: Phasing common variants

shapeit5 phase_common \
    --input wgs/target.unrelated.bcf \
    --filter-maf 0.001 \
    --region 1:1-6000000 \
    --map info/chr1.gmap.gz \
    --output tmp/target.scaffold.chunk0.bcf \
    --thread ${NSLOTS}

shapeit5 phase_common \
    --input wgs/target.unrelated.bcf \
    --filter-maf 0.001 \
    --region 1:4000001-10000000 \
    --map info/chr1.gmap.gz \
    --output tmp/target.scaffold.chunk1.bcf \
    --thread ${NSLOTS}

ls -1v tmp/target.scaffold.chunk*.bcf > tmp/files.txt

shapeit5 ligate \
    --input tmp/files.txt \
    --output tmp/target.scaffold.bcf \
    --thread ${NSLOTS} \
    --index

# STEP2: Phase rare variants small region

while read LINE; do
    ID=$(echo $LINE | awk '{ print $1; }')
    SRG=$(echo $LINE | awk '{ print $3; }')
    IRG=$(echo $LINE | awk '{ print $4; }')
    shapeit5 phase_rare \
        --input wgs/target.unrelated.bcf \
        --scaffold tmp/target.scaffold.bcf \
        --map info/chr1.gmap.gz \
        --input-region $IRG \
        --scaffold-region $SRG \
        --output tmp/target.phased.chunk$CHK\.bcf \
        --thread ${NSLOTS}
done < info/chunks.coordinates.txt

# STEP3: Obtaining chromosome-wide phased data

ls -1v tmp/target.phased.chunk$CHK\.bcf > tmp/files.txt

shapeit5 bcftools concat \
    --naive \
    -f tmp/files.txt \
    -o target.phased.bcf \
    --threads ${NSLOTS}

shapeit5 bcftools index \
    -f target.phased.bcf

References