AlphaFold 3¶
AlphaFold 3 is an application for predicting models of protein structures.
AlphaFold 2
AlphaFold 2 documentation can be found here.
AlphaFold 3 is available as an Apptainer container on Apocrita.
Usage¶
AlphaFold 3 requires a suite of supporting tools to be installed, so for reproducibility, we provide all of the tools in a container along with AlphaFold 3.
To run the default version of AlphaFold 3, simply load the alphafold3
module:
module load alphafold3
The default module only supports Ampere or Hopper cards
The default alphafold3
module is only compatible with Ampere A100 or
Hopper H100 cards. If you wish to use an older Volta V100 card, you must
specify the module suffixed _volta
(e.g. alphafold3/3.0.1_volta
) and
add the additional argument --flash_attention_implementation=xla
to the
alphafold
command in your job script.
Example job¶
You must acquire your own Model Parameters
AlphaFold 3 has a strict license for Model Parameters (see here for more information). You must request access from Google DeepMind using this form. ITSR will not provide these centrally and you must agree to and adhere to the terms of use at all times. ITSR will not be held responsible for any breaches of the terms of use.
AlphaFold 3 does not require a separate run.sh
script
Unlike AlphaFold 2, AlphaFold 3 does not require a separate run.sh
script.
All runtime arguments should be contained within your job script as per the
example below.
GPU job¶
The default module only supports Ampere or Hopper cards
The default alphafold3
module is only compatible with Ampere A100 or
Hopper H100 cards. If you wish to use an older Volta V100 card, you must
specify the module suffixed _volta
(e.g. alphafold3/3.0.1_volta
) and
add the additional argument --flash_attention_implementation=xla
to the
alphafold
command in your job script.
To run AlphaFold 3 from the container, we prepare a job script called
alpha.qsub
:
#$ -cwd
#$ -j y
#$ -pe smp 8 # 8 cores per GPU
#$ -l h_rt=240:0:0 # 240 hours runtime
#$ -l h_vmem=11G # 11G RAM per core
#$ -l gpu=1 # AlphaFold 3 only uses 1 GPU
#$ -l gpu_type=ampere|hopper # The default AlphaFold 3 module only runs on Ampere or Hopper GPUs
# Approved DERI users can include the following line
#$ -l cluster=andrena
# Specify an output destination.
export OUTPUT=/data/scratch/${USER}/alphafold_out
# Set the destination of the AlphaFold dataset
export DOWNLOAD_DIR=/data/DERI-DataSets/AlphaFold
module load alphafold3
# Options for Alphafold
alphafold \
--json_path=${HOME}/alphafold_input/fold_input.json \
--mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
--model_dir=/path/to/model/parameters \
--ntrna_database_path=${DOWNLOAD_DIR}/ntrna/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta \
--pdb_database_path=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
--rfam_database_path=${DOWNLOAD_DIR}/rfam/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta \
--rna_central_database_path=${DOWNLOAD_DIR}/rna/rnacentral_active_seq_id_90_cov_80_linclust.fasta \
--seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres_2022_09_28.fasta \
--small_bfd_database_path=${DOWNLOAD_DIR}/small_bfd/bfd-first_non_consensus_sequences.fasta \
--uniprot_cluster_annot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot_all_2021_04.fa \
--uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90_2022_05.fa \
--output_dir=${OUTPUT}
The example uses scratch storage for output, although shared project storage may be used if you have access to any.
The API for AlphaFold 3 tends to change with each release, so be sure to check the GitHub repository to keep up to date with the changelogs. Sometimes flags are changed/removed and your run command will need to be amended accordingly.
The ${DOWNLOAD_DIR}
path will not need to change. This is a 2TB dataset
required to make AlphaFold work.
Input data¶
You will need some input data for the job. In the above example,
fold_input.json
is taken from the
Installation and Running Your First Prediction
guide of the official AlphaFold 3 repository and stored in the
${HOME}/alphafold_input
directory.
Running the job¶
Check that the job script correctly specifies the output location and any input
files. Then submit the job. In the above example, this is done with
qsub alpha.qsub
.
The AlphaFold processing workflow will run tasks on the CPU, followed by bursts of intensive GPU activity. For the above example, running on either an A100 or H100 GPU completes in around 10-15 minutes and running on a Volta V100 GPU completes in around 15-20 minutes.
Results¶
The example job produces the following output in
/data/scratch/${USER}/alphafold_out/2pv7
:
$ ls -1 /data/scratch/${USER}/alphafold_out/2pv7
2pv7_confidences.json
2pv7_data.json
2pv7_model.cif
2pv7_summary_confidences.json
ranking_scores.csv
seed-1_sample-0
seed-1_sample-1
seed-1_sample-2
seed-1_sample-3
seed-1_sample-4
TERMS_OF_USE.md