Nextflow¶
Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of data-driven, computational pipelines written in the most common scripting languages.
Nextflow is available as a module on Apocrita.
Usage¶
To run the default installed version of Nextflow, simply load the nextflow
module:
$ module load nextflow
$ nextflow help
Usage: nextflow [options] COMMAND [arg...]
For usage documentation, run nextflow help
.
Submitting processes as serial jobs¶
Recommended for serial jobs only
This section is recommended for serial jobs only. For parallel jobs, please see the Parallel jobs section below.
Nextflow supports the ability to submit pipeline scripts as separate cluster jobs using the SGE executor.
To enable the SGE executor, simply set to process.executor
property to sge
in a configuration file named nextflow.config
in the job working directory.
The amount of resources requested by each job submission is defined in the
cluster options
section, where all Univa scheduler resources
are supported.
For example, to run all pipeline jobs with 2 serial cores and 2GB of memory for 1 hour, create the following configuration file:
process.executor='sge'
process.clusterOptions='-pe smp 2 -l h_vmem=1G,h_rt=1:0:0'
Setting the memory limit for serial jobs
Add the -DXmx
option to limit the amount of memory Nextflow can use
in serial jobs. For more information regarding the Java VM memory
allocation, see here.
Parallel jobs¶
Parallel jobs will use the in-built Apache Ignite clustering platform; Execution will be performed on the nodes requested in the submit request over MPI rather than submitting new jobs for each pipeline.
Do not use the SGE executor in parallel jobs
Using the SGE executor for parallel jobs causes the master job to hang until it is killed by the scheduler for exceeding walltime. This is due to Apache Ignite not being able to communicate to other pipeline scripts submitted as separate jobs.
To ensure parallel jobs use Apache Ignite, add the following to the
configuration file (or omit the process.executor
setting):
process.executor='ignite'
Example jobs¶
Serial job¶
Here is an example job taken from the Nextflow website to submit
each process in the input.nf
file as a new cluster job with 1 core
and 1GB of memory. Ensure the cumulative runtime across all processes does
not exceed the runtime requested in the master job:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G
module load nextflow
nextflow -DXmx=1G \
-C nextflow.config \
run input.nf
Parallel job¶
Here is an example job taken from the Nextflow website to run each
process in the input.nf
file using 96 cores across 2 ddy nodes with Apache
Ignite:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe parallel 96
#$ -l infiniband=ddy-i
#$ -l h_rt=240:0:0
module load nextflow openmpi
mpirun --pernode \
nextflow run input.nf \
-with-mpi