Public / Shared Data available on Apocrita

In order to prevent duplication of data and to save valuable research time we provide a local copy of some widely used public datasets.

QMUL staff can contact us to request corrections, updates or the addition of new datasets to this repository.

Datasets available

Name and Location on Apocrita Description
Blast databases
/data/PublicDataSets/blast/db
Standard set of database for
BLAST (Basic Local Alignment Search Tool)
CDD
/data/PublicDataSets/CDD
The Conserved Domain Database is a resource
for the annotation of functional units in proteins
GATK Bundle
/data/PublicDataSets/GATKbundle
Standard files for working with human
resequencing data with the GATK
Galaxy hg datasets
/data/PublicDataSets/galaxy
Reference genomes for use with Galaxy
Illumina Genomes
/data/PublicDataSets/genomes
Ready-To-Use Reference Sequences
and Annotations
NCBI WGS
/data/PublicDataSets/NCBI
Whole Genome Shotgun projects are genome
assemblies of incomplete genomes
NR Protein sequences
/data/PublicDataSets/shared_dbs/nr
Non-redundant protein sequences from GenPept,
Swissprot, PIR, PDF, PDB, and NCBI RefSeq
Prot_RefSeq
/data/PublicDataSets/shared_dbs/prot_refseq
Protein data for subset of commonly used
model organisms, downloaded from NCBI
UniRef50
/data/PublicDataSets/shared_dbs/uniref50
The UniProt Reference Clusters (UniRef)
provide clustered sets of sequences from
the UniProt knowledgebase
Uniprot
/data/PublicDataSets/shared_dbs/uniprot
Database of protein sequence and functional
information