AlphaFold 3¶

AlphaFold 3 is an application for predicting models of protein structures.

AlphaFold 2

AlphaFold 2 documentation can be found here.

AlphaFold 3 is available as an Apptainer container on Apocrita.

Usage¶

AlphaFold 3 requires a suite of supporting tools to be installed, so for reproducibility, we provide all of the tools in a container along with AlphaFold 3.

To run the default version of AlphaFold 3, simply load the alphafold3 module:

module load alphafold3

The default module only supports Ampere or Hopper cards

The default alphafold3 module is only compatible with Ampere A100 or Hopper H100 cards. If you wish to use an older Volta V100 card, you must specify the module suffixed -volta (e.g. alphafold3/3.0.1-volta) and add the additional argument --flash_attention_implementation=xla to the alphafold command in your job script.

Example job¶

You must acquire your own Model Parameters

AlphaFold 3 has a strict license for Model Parameters (see here for more information). You must request access from Google DeepMind using this form. ITSR will not provide these centrally and you must agree to and adhere to the terms of use at all times. ITSR will not be held responsible for any breaches of the terms of use.

AlphaFold 3 does not require a separate run.sh script

Unlike AlphaFold 2, AlphaFold 3 does not require a separate run.sh script. All runtime arguments should be contained within your job script as per the example below.

GPU job¶

The default module only supports Ampere or Hopper cards

The default alphafold3 module is only compatible with Ampere A100 or Hopper H100 cards. If you wish to use an older Volta V100 card, you must specify the module suffixed -volta (e.g. alphafold3/3.0.1-volta) and add the additional argument --flash_attention_implementation=xla to the alphafold command in your job script.

To run AlphaFold 3 from the container, we prepare a job script called alpha.qsub:

#$ -cwd
#$ -j y
#$ -pe smp 8                   # 8 cores per GPU
#$ -l h_rt=240:0:0             # 240 hours runtime
#$ -l h_vmem=11G               # 11G RAM per core
#$ -l gpu=1                    # AlphaFold 3 only uses 1 GPU
#$ -l gpu_type=ampere|hopper   # The default AlphaFold 3 module only runs on Ampere or Hopper GPUs
# Approved DERI users can include the following line
#$ -l cluster=andrena

# Specify an output destination.
export OUTPUT=/data/scratch/${USER}/alphafold_out

# Set the destination of the AlphaFold dataset
export DOWNLOAD_DIR=/data/DERI-DataSets/AlphaFold

module load alphafold3

# Options for Alphafold
alphafold \
  --json_path=${HOME}/alphafold_input/fold_input.json \
  --mgnify_database_path=${DOWNLOAD_DIR}/mgnify/mgy_clusters_2022_05.fa \
  --model_dir=/path/to/model/parameters \
  --ntrna_database_path=${DOWNLOAD_DIR}/ntrna/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta \
  --pdb_database_path=${DOWNLOAD_DIR}/pdb_mmcif/mmcif_files \
  --rfam_database_path=${DOWNLOAD_DIR}/rfam/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta \
  --rna_central_database_path=${DOWNLOAD_DIR}/rna/rnacentral_active_seq_id_90_cov_80_linclust.fasta \
  --seqres_database_path=${DOWNLOAD_DIR}/pdb_seqres/pdb_seqres_2022_09_28.fasta \
  --small_bfd_database_path=${DOWNLOAD_DIR}/small_bfd/bfd-first_non_consensus_sequences.fasta \
  --uniprot_cluster_annot_database_path=${DOWNLOAD_DIR}/uniprot/uniprot_all_2021_04.fa \
  --uniref90_database_path=${DOWNLOAD_DIR}/uniref90/uniref90_2022_05.fa \
  --output_dir=${OUTPUT}

The example uses scratch storage for output, although shared project storage may be used if you have access to any.

The API for AlphaFold 3 tends to change with each release, so be sure to check the GitHub repository to keep up to date with the changelogs. Sometimes flags are changed/removed and your run command will need to be amended accordingly.

The ${DOWNLOAD_DIR} path will not need to change. This is a 2TB dataset required to make AlphaFold work.

Input data¶

You will need some input data for the job. In the above example, fold_input.json is taken from the Installation and Running Your First Prediction guide of the official AlphaFold 3 repository and stored in the ${HOME}/alphafold_input directory.

Running the job¶

Check that the job script correctly specifies the output location and any input files. Then submit the job. In the above example, this is done with qsub alpha.qsub.

The AlphaFold processing workflow will run tasks on the CPU, followed by bursts of intensive GPU activity. For the above example, running on either an Ampere A100 or Hopper H100 GPU completes in around 10-15 minutes and running on a Volta V100 GPU completes in around 15-20 minutes.

Results¶

The example job produces the following output in /data/scratch/${USER}/alphafold_out/2pv7:

$ ls -1 /data/scratch/${USER}/alphafold_out/2pv7
2pv7_confidences.json
2pv7_data.json
2pv7_model.cif
2pv7_summary_confidences.json
ranking_scores.csv
seed-1_sample-0
seed-1_sample-1
seed-1_sample-2
seed-1_sample-3
seed-1_sample-4
TERMS_OF_USE.md

References¶

AlphaFold 3 GitHub repository