Running AlphaFold 3 on DaVinci

To get access to AlphaFold run module load AlphaFold/3.0.0. This will add the run_alphafold.py script to your path.
This script requires two parameters:
- The input directory (--json_path FILE). Path to the input JSON file.
- The output directory (--output_dir DIR). Path to a directory where the results will be saved.
You can check all the options with run_alphafold.py --help. Check the Alphafold 3 GitHub page for more information.

Ampere cards

You can create an interactive session and run the script directly but given the long runtimes of AlphaFold it's best to submit the task as a batch job.

You'll need to draft a batch script. Here's an example to get you started:

  #!/usr/bin/bash

  #SBATCH --job-name=AlphaFold
  #SBATCH --cpus-per-task=8
  #SBATCH -N 1
  #SBATCH --gres=gpu:1
  #SBATCH -C Ampere
  #SBATCH -p regular
  #SBATCH --mem-per-cpu=6000

  module load AlphaFold/3.0.0
  input=fold_input.json
  echo Predicting $input
  time run_alphafold.py  --output_dir . --json_path $input

Turing cards

To be able to run with Turing cards (e.g. RTX 2080 Ti) you 'll need a few tweaks.
You need to set XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" and pass --flash_attention_implementation=xla to run_alphafold.py.

You can find more details in this Github issue.

  #!/usr/bin/bash

  #SBATCH --job-name=AlphaFold
  #SBATCH --cpus-per-task=8
  #SBATCH -N 1
  #SBATCH --gres=gpu:1
  #SBATCH -C RTX2080Ti
  #SBATCH -p regular
  #SBATCH --mem-per-cpu=6000

  module load AlphaFold/3.0.0
  input=fold_input.json
  echo Predicting $input
  export XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
  time run_alphafold.py --flash_attention_implementation=xla --output_dir . --json_path $input

Performance

The following environment variables are set by module load AlphaFold/3.0.0:

XLA_PYTHON_CLIENT_PREALLOCATE=false, TF_FORCE_UNIFIED_MEMORY=true and XLA_CLIENT_MEM_FRACTION=3.2 to use unified memory.

You can change these and tweak the performance by following the documentation.

I have not adjusted pair_transition_shard_spec in model_config.py as suggested in the performance documentation, but it could be an option when running into memory problems.

Running AlphaFold 2

First check if what you want isn't already available at EBI's AlphaFold database.
If not, you can run Alphafold on Colab using ColabFold.
This version of ColabFold is the recommended way to get started.
If the above does not suit your needs, continue to the next section.

Running AlphaFold 2 on DaVinci

If for some reason ColabFold is not sufficient for your needs you can also run Alphafold on DaVinci.
To get access to AlphaFold run module load AlphaFold/2.3.2. This will add the run_alphafold.sh script to your path.
This script requires two parameters:
- The FASTA file with the input sequence (-f FILE).
- The output directory (-o DIR). Note that AlphaFold will create one directory with the name of the FASTA file within the directory that you give.
You can create an interactive session and run the script directly but given the long runtimes of AlphaFold it's best to submit the task as a batch job.

You'll need to draft a batch script. Here's an example to get you started:

  #!/usr/bin/bash

  #SBATCH --job-name=AlphaFold
  #SBATCH --cpus-per-task=8
  #SBATCH -N 1
  #SBATCH --gres=gpu:1
  #SBATCH -C RTX2080Ti
  #SBATCH -p regular
  #SBATCH --mem-per-cpu=6000

  module load AlphaFold
  input=test.fasta
  echo Predicting $input
  time run_alphafold.sh -o . -f $input

Visualizing the results

AlphaFold will create a bunch of files in the output directory. These are described in detail in the README of https://github.com/deepmind/alphafold.
The best ranked model is available with the name ranked_0.pdb. It's equivalent to the model shown at EBI's database.
The B factors in that PDB correspond to the per-atom pLDDTs, just like at EBI's database.

A file ranking_debug.json list the overall pLDDTs for all the models tried. By default 5 models are ran. For example the following file:

  {
      "plddts": {
          "model_1_ptm": 83.76404249206948,
          "model_2_ptm": 84.43304896342197,
          "model_3_ptm": 86.97910795602691,
          "model_4_ptm": 85.06833732904242,
          "model_5_ptm": 84.05097568175553
      },
      "order": [
          "model_3_ptm",
          "model_4_ptm",
          "model_2_ptm",
          "model_5_ptm",
          "model_1_ptm"
      ]
  }

shows that model_3_ptm is the top ranked model.

If you want to obtain the predicted aligned error (PAE) for the best model, just as is shown in EBI's database, you need to extract it from the results_model file corresponding to the highest ranked model. In the above case that would be result_model_3.pkl.

This is a Python pickled file. The PAE matrix is stored under the key predicted_aligned_error. You can use a simple Python script like this to plot it:

  #!/usr/bin/env python

  import pickle
  import json
  import sys
  import matplotlib.pyplot as plt

  rank = json.load(open( 'ranking_debug.json', "rb" ))
  top_model = rank['order'][0]

  result = pickle.load(open( 'result_'+top_model+'.pkl', "rb" ))
  print(result['predicted_aligned_error'].shape)

  plt.imshow(result['predicted_aligned_error'],cmap='Greens_r',vmin=0,vmax=34)
  plt.title('Predicted Aligned Error')
  plt.xlabel('Scored residue')
  plt.ylabel('Aligned residue')
  cbar = plt.colorbar()
  cbar.set_label('Expected position error (Ångströms)')
  plt.show()

You must use a pTM model (the default) to have PAE values in the result.

Installing AlphaFold

The official DeepMind alphafold release (https://github.com/deepmind/alphafold) only supports a Docker container, which does not fit well in a cluster environment.
I've mostly been following the steps from https://github.com/kalininalab/alphafold\_non\_docker

Installing AlphaFold 3

The official DeepMind alphafold release (https://github.com/google-deepmind/alphafold3.git) only supports a Docker or Singularity container, which does not fit well in our cluster environment.
I've mostly been following the steps from the Dockerfile of the official distribution

Make sure python binaries end up in the scratch filesystem. Otherwise they will be placed in the home directory of the user that creates the environment:

export UV_PYTHON_INSTALL_DIR=/scratch/burst/uv

mkdir -p /scratch/burst/alphafold-3
uv venv --python 3.11 /scratch/burst/alphafold-3/

source /scratch/burst/alphafold-3/bin/activate
uv pip install pip

# Not part of Dockerbuild but necessary
uv pip install setuptools

Install hmmer:

mkdir /scratch/burst/alphafold-3/src
cd /scratch/burst/alphafold-3/src
mkdir hmmer_build hmmer
wget http://eddylab.org/software/hmmer/hmmer-3.4.tar.gz --directory-prefix /scratch/burst/alphafold-3/src/hmmer_build
(cd hmmer_build && tar zxf hmmer-3.4.tar.gz && rm -f hmmer-3.4.tar.gz)
(cd hmmer_build/hmmer-3.4 && ./configure --prefix /scratch/burst/alphafold-3/)
(cd hmmer_build/hmmer-3.4 && make -j)
(cd hmmer_build/hmmer-3.4 && make install)
(cd hmmer_build/hmmer-3.4/easel && make install)
git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3
pip3 install -r dev-requirements.txt
pip3 install --no-deps .
build_data
mkdir /scratch/burst/alphafold-3/share/alphafold3
yum install -y zstd
bash /scratch/burst/alphafold-3/src/alphafold3/fetch_databases.sh /scratch/burst/alphafold-3/share/alphafold3/public_databases

Fix permissions on database:

chmod 755 /scratch/burst/alphafold-3/share/alphafold3/public_databases/mmcif_files
chmod -R +r /scratch/burst/alphafold-3/share/alphafold3/public_databases/mmcif_files

Copy over the model:

mkdir /scratch/burst/alphafold-3/share/alphafold3/models
mv <model.zstd> /scratch/burst/alphafold-3/share/alphafold3/models
cd /scratch/burst/alphafold-3/share/alphafold3/models
unzstd <model.zstd>

This part is not part of the docker. Just move the running scripts into the path and fix some paths:

cp /scratch/burst/alphafold-3/src/alphafold3/run_alphafold.py /scratch/burst/alphafold-3/bin/

Point to the place where we put the data instead ot the default /root:

sed -i "s|_HOME_DIR = pathlib\.Path(os\.environ\.get('HOME'))|_HOME_DIR = pathlib.Path('/scratch/burst/alphafold-3/share/alphafold3/')|" /scratch/burst/alphafold-3/bin/run_alphafold.py
cp /scratch/burst/alphafold-3/src/alphafold3/run_alphafold_test.py /scratch/burst/alphafold-3/bin/

Make the scripts executable and add a shebang:

chmod +x /scratch/burst/alphafold-3/bin/run_alphafold.py
chmod +x /scratch/burst/alphafold-3/bin/run_alphafold_test.py
sed -i '1s/^/\#\!\/usr\/bin\/env python\n/' /scratch/burst/alphafold-3/bin/run_alphafold.py
sed -i '1s/^/\#\!\/usr\/bin\/env python\n/' /scratch/burst/alphafold-3/bin/run_alphafold_test.py