Taiyaki¶
Taiyaki is research software for training models for basecalling Oxford Nanopore reads.
Taiyaki is available from Github. You can install Taiyaki in your home directory, scratch or lab storage.
Installation¶
Installation should be carried out on a node with GPU hardware.
Firstly request an interactive session on a GPU node:
qlogin -pe smp 12 -l h_vmem=7.5G -l gpu=1
Change to the directory you want to install Taiyaki into.
Next load modules for CUDA and Python3:
module load cuda && module load python
And using git, clone the Taiyaki repository:
git clone https://github.com/nanoporetech/taiyaki.git
And finally:
cd taiyaki && make install
You should see scrolling text indicating that installation is in progress and eventually you should see:
To activate your new environment: source venv/bin/activate
Usage¶
Since Taiyaki is written in Python 3 you will need to load the Python 3 module to use it. You will also need to load the CUDA module you used to build Taiyaki, and you'll probably want to ensure you are running on the same GPU type. Assuming your username is "abc123" and you have installed Taiyaki in scratch you would use something like the following to activate the virtual environment:
module load python/3.6.3
source /data/scratch/abc123/taiyaki/bin/activate
Taiyaki doesn't appear to have online help but the following examples may be useful:
Training a Modified Base Model.
Use resources responsibly!
For commands that accept --jobs
make sure that you pass the number of
cores that you have requested. The best way to do this is to use the
${NSLOTS}
environment variable.
Also note that the commands which accept --jobs
do not make use of GPUs.
You should only use commands which accept --device
on GPU nodes.
See examples below.
Licensing¶
Taiyaki is licensed under the Oxford Nanopore Technologies Public License.
Example jobs¶
The following examples are taken from here. The serial job involves the scripts which do not make use of GPU resources. The GPU job does the actual machine learning. You may wish to investigate Job Holds so you can submit both jobs at once and the scheduler will hold the GPU job until the first job has completed.
Serial job¶
Here is an example job running on 1 core and 1GB of memory:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
# Load Python module
module load python
# Activate Taiyaki venv
source /data/scratch/abc123/taiyaki/venv/bin/activate
# Generate Parameters
generate_per_read_params.py --jobs ${NSLOTS} reads > modbase.tsv
# Prepare Reads
prepare_mapped_reads.py --jobs ${NSLOTS} --mod Z C 5mC --mod Y A 6mA reads modbase.tsv modbase.hdf5 r941_dna_minion.checkpoint modbase_references.fasta
GPU job¶
Here is an example job running on 1 GPU:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 12
#$ -l h_rt=240:0:0
#$ -l h_vmem=7.5G
#$ -l gpu=1
# Load Python & CUDA modules
module load python
module load cuda
# Activate Taiyaki venv
source /data/scratch/abc123/taiyaki/venv/bin/activate
# Train modified base model
train_mod_flipflop.py --device 0 --mod_factor 0.01 --outdir training mGru_cat_mod_flipflop.py modbase.hdf5
train_mod_flipflop.py --device 0 --mod_factor 1.0 --outdir training2 training/model_final.checkpoint modbase.hdf5
# Basecall
basecall.py --device 0 --modified_base_output basecalls.hdf5 reads training2/model_final.checkpoint > basecalls.fa