SHAPEIT5¶
SHAPEIT5 (Segmented HAPlotype Estimation and Imputation Tools) is a collection of tools that estimates haplotypes in large datasets, with a special focus on rare variants.
SHAPEIT5 is available as a module on Apocrita.
Usage¶
To run the default installed version of SHAPEIT5, simply load the
shapeit5
module:
module load shapeit5
The binaries available as part of shapeit5
are:
ligate
phase_common
phase_rare
simulate
switch
xcftools
Run the tool you require by prefixing it with shapeit5
. For example, to run
phase_common
:
$ shapeit5 phase_common --help
[SHAPEIT5] phase_common (jointly phase multiple common markers)
* Author : Olivier DELANEAU, University of Lausanne
* Contact : olivier.delaneau@gmail.com
* Version : 5.1.1 / commit = 990ed0d / release = 2023-05-08
* Run date : 18/07/2023 - 10:22:18
(etc.)
Add any arguments after the binary being run, for example:
shapeit5 ligate --input input.txt --output output.bcf --thread ${NSLOTS} --index
Core Utilisation
Use the --thread ${NSLOTS}
flag as described above to ensure you use the
correct number of cores for your job.
The module also has its own internal version of bcftools
which you can run:
$ shapeit5 bcftools --version
bcftools 1.15
Using htslib 1.15
Copyright (C) 2022 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Bcftools Core Utilisation
Use the --threads ${NSLOTS}
flag when running bcftools
to ensure you use
the correct number of cores for your job (see example job below).
Example job¶
Serial job¶
Here is an example job running on 8 cores and 16GB of memory based on the
tutorial for
Simulated WGS data.
You need to clone the SHAPEIT5 repository
and then create your job script in the test
folder so that the paths are
correct.
#!/bin/bash
#$ -cwd
#$ -pe smp 8
#$ -l h_rt=24:0:0
#$ -l h_vmem=2G
#$ -j y
#$ -N shapeit5-test
# Adapted from https://odelaneau.github.io/shapeit5/docs/tutorials/simulated/
# STEP1: Phasing common variants
shapeit5 phase_common \
--input wgs/target.unrelated.bcf \
--filter-maf 0.001 \
--region 1:1-6000000 \
--map info/chr1.gmap.gz \
--output tmp/target.scaffold.chunk0.bcf \
--thread ${NSLOTS}
shapeit5 phase_common \
--input wgs/target.unrelated.bcf \
--filter-maf 0.001 \
--region 1:4000001-10000000 \
--map info/chr1.gmap.gz \
--output tmp/target.scaffold.chunk1.bcf \
--thread ${NSLOTS}
ls -1v tmp/target.scaffold.chunk*.bcf > tmp/files.txt
shapeit5 ligate \
--input tmp/files.txt \
--output tmp/target.scaffold.bcf \
--thread ${NSLOTS} \
--index
# STEP2: Phase rare variants small region
while read LINE; do
ID=$(echo $LINE | awk '{ print $1; }')
SRG=$(echo $LINE | awk '{ print $3; }')
IRG=$(echo $LINE | awk '{ print $4; }')
shapeit5 phase_rare \
--input wgs/target.unrelated.bcf \
--scaffold tmp/target.scaffold.bcf \
--map info/chr1.gmap.gz \
--input-region $IRG \
--scaffold-region $SRG \
--output tmp/target.phased.chunk$CHK\.bcf \
--thread ${NSLOTS}
done < info/chunks.coordinates.txt
# STEP3: Obtaining chromosome-wide phased data
ls -1v tmp/target.phased.chunk$CHK\.bcf > tmp/files.txt
shapeit5 bcftools concat \
--naive \
-f tmp/files.txt \
-o target.phased.bcf \
--threads ${NSLOTS}
shapeit5 bcftools index \
-f target.phased.bcf