Skip to content

Nextflow

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of data-driven, computational pipelines written in the most common scripting languages.

Nextflow is available as a module on Apocrita.

Usage

To run the latest installed version of Nextflow, simply load the nextflow module:

$ module load nextflow
$ nextflow help

Usage: nextflow [options] COMMAND [arg...]

For usage documentation, run nextflow help.

Submitting Processes as Serial Jobs

Recommended for serial jobs only

This section is recommended for serial jobs only. For parallel jobs, please see the Parallel Jobs section below.

Nextflow supports the ability to submit pipeline scripts as separate cluster jobs using the SGE executor.

To enable the SGE executor, simply set to process.executor property to sge in a configuration file named nextflow.config in the job working directory. The amount of resources requested by each job submission is defined in the cluster options section, where all Univa scheduler resources are supported.

For example, to run all pipeline jobs with 2 serial cores and 2GB total memory for 1 hour, create the following configuration file:

process.executor='sge'
process.clusterOptions='-pe smp 2 -l h_vmem=1G,h_rt=1:0:0'

Setting the memory limit for serial jobs

Add the -DXmx option to limit the amount of memory Nextflow can use in serial jobs. For more information regarding the Java VM memory allocation, see here.

Parallel Jobs

Parallel jobs will use the in-built Apache Ignite clustering platform; Execution will be performed on the nodes requested in the submit request over MPI rather than submitting new jobs for each pipeline.

Do not use the SGE executor in parallel jobs

Using the SGE executor for parallel jobs causes the master job to hang until it is killed by the scheduler for exceeding walltime. This is due to Apache Ignite not being able to communicate to other pipeline scripts submitted as separate jobs.

To ensure parallel jobs use Apache Ignite, add the following to the configuration file (or omit the process.executor setting):

process.executor='ignite'

Example jobs

Serial job

Here is an example job taken from the Nextflow website to submit each process in the input.nf file as a new cluster job. Ensure the cumulative runtime across all processes does not exceed the runtime requested in the master job.

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load nextflow

nextflow -DXmx=1G \
         -C nextflow.config \
         run input.nf

Parallel job

Here is an example MPI job taken from the Nextflow website to run each process in the input.nf file using Apache Ignite.

#!/bin/sh
#$ -cwd
#$ -j y
#$ -pe parallel 64
#$ -l infiniband=nxv
#$ -l h_rt=1:0:0

module load nextflow openmpi

mpirun --pernode \
       nextflow run input.nf \
       -with-mpi

References