Skip to content

RepeatMasker

RepeatMasker screens DNA sequences for interspersed repeats and low complexity DNA sequences.

RepeatMasker is available to install from the Bioconda Anaconda channel.

Installation

Load the default Miniforge module:

module load miniforge

If required, create a new Conda environment:

mamba create -n rm_env

Activate your Conda environment:

mamba activate rm_env

In your activated environment, install RepeatMasker from the Bioconda Anaconda channel, additionally specifying the Conda Forge channel for any additional required dependencies:

mamba install -c bioconda -c conda-forge repeatmasker

Usage

To run the installed version of RepeatMasker, simply load the miniforge module and activate your Conda environment:

module load miniforge
mamba activate rm_env

For usage documentation, run RepeatMasker with no arguments:

(rm_env) $ RepeatMasker
RepeatMasker version X.Y.Z
No query sequence file indicated

NAME
    RepeatMasker - Mask repetitive DNA

SYNOPSIS
      RepeatMasker [-options] <seqfiles(s) in fasta format>

A detailed help document can also be viewed by running RepeatMasker -help.

Example jobs

Serial jobs

Here is an example job running on 1 core and 1GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 1
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load miniforge
mamba activate rm_env

RepeatMasker seqfiles.fa

Here is an example job running on 4 cores and 4GB of memory:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe smp 4
#$ -l h_rt=1:0:0
#$ -l h_vmem=1G

module load miniforge
mamba activate rm_env

# By default, RepeatMasker will start 2 threads for every slot requested,
# resulting in badly overloaded jobs. To avoid this, set a variable equal
# to half of the allocated slots
REPCORES=$((NSLOTS / 2))

RepeatMasker -pa ${REPCORES} seqfiles.fa

To request a different number of slots, simply change the core request (smp) value, no additional changes are required. Requesting an even number of slots will ensure that the REPCORES variable is set correctly.

References