Skip to content

Public / Shared Data available on Apocrita

In order to prevent duplication of data and to save valuable research time we provide a local copy of some widely used public datasets.

QMUL staff can contact us to request corrections, updates or the addition of new datasets to this repository.

Datasets available

Name and Location on Apocrita Description
Blast databases
Standard set of databases for
BLAST (Basic Local Alignment Search Tool)
The Conserved Domain Database is a resource
for the annotation of functional units in proteins
GATK Bundle
Standard files for working with human
resequencing data with the GATK
Galaxy hg datasets
Reference genomes for use with Galaxy
Illumina Genomes
Ready-To-Use Reference Sequences
and Annotations
Whole Genome Shotgun projects are genome
assemblies of incomplete genomes
NR Protein sequences
Non-redundant protein sequences from GenPept,
Swissprot, PIR, PDF, PDB, and NCBI RefSeq
Protein data for subset of commonly used
model organisms, downloaded from NCBI
The UniProt Reference Clusters (UniRef)
provide clustered sets of sequences from
the UniProt knowledgebase
Database of protein sequence and functional
Annotated image database for Machine Learning
Combined Annotation Dependent Depletion
CADD is a tool for scoring the deleteriousness
of single nucleotide variants as well as
insertion/deletions variants in the human genome.