module load AlphaFold/3.0.0. This will add the
run_alphafold.py script to your path.--json_path FILE). Path to the
input JSON file.--output_dir DIR). Path to a
directory where the results will be saved.run_alphafold.py --help. Check the Alphafold 3 GitHub
page for more information.You can create an interactive session and run the script directly but given the long runtimes of AlphaFold it's best to submit the task as a batch job.
You'll need to draft a batch script. Here's an example to get you started:
#!/usr/bin/bash
#SBATCH --job-name=AlphaFold
#SBATCH --cpus-per-task=8
#SBATCH -N 1
#SBATCH --gres=gpu:1
#SBATCH -C Ampere
#SBATCH -p regular
#SBATCH --mem-per-cpu=6000
module load AlphaFold/3.0.0
input=fold_input.json
echo Predicting $input
time run_alphafold.py --output_dir . --json_path $inputTo be able to run with Turing cards (e.g. RTX 2080 Ti) you 'll need a few tweaks.
You need to set
XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
and pass --flash_attention_implementation=xla to
run_alphafold.py.
You can find more details in this Github issue.
#!/usr/bin/bash
#SBATCH --job-name=AlphaFold
#SBATCH --cpus-per-task=8
#SBATCH -N 1
#SBATCH --gres=gpu:1
#SBATCH -C RTX2080Ti
#SBATCH -p regular
#SBATCH --mem-per-cpu=6000
module load AlphaFold/3.0.0
input=fold_input.json
echo Predicting $input
export XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter"
time run_alphafold.py --flash_attention_implementation=xla --output_dir . --json_path $inputThe following environment variables are set by
module load AlphaFold/3.0.0:
XLA_PYTHON_CLIENT_PREALLOCATE=false,
TF_FORCE_UNIFIED_MEMORY=true and
XLA_CLIENT_MEM_FRACTION=3.2 to use unified memory.You can change these and tweak the performance by following the documentation.
I have not adjusted pair_transition_shard_spec in
model_config.py as suggested in the performance
documentation, but it could be an option when running into memory
problems.
If for some reason ColabFold is not sufficient for your needs you can also run Alphafold on DaVinci.
To get access to AlphaFold run
module load AlphaFold/2.3.2. This will add the
run_alphafold.sh script to your path.
This script requires two parameters:
-f FILE).-o DIR). Note that AlphaFold will
create one directory with the name of the FASTA file within the
directory that you give.You can create an interactive session and run the script directly but given the long runtimes of AlphaFold it's best to submit the task as a batch job.
You'll need to draft a batch script. Here's an example to get you started:
#!/usr/bin/bash
#SBATCH --job-name=AlphaFold
#SBATCH --cpus-per-task=8
#SBATCH -N 1
#SBATCH --gres=gpu:1
#SBATCH -C RTX2080Ti
#SBATCH -p regular
#SBATCH --mem-per-cpu=6000
module load AlphaFold
input=test.fasta
echo Predicting $input
time run_alphafold.sh -o . -f $inputAlphaFold will create a bunch of files in the output directory. These are described in detail in the README of https://github.com/deepmind/alphafold.
The best ranked model is available with the name
ranked_0.pdb. It's equivalent to the model shown at EBI's
database.
The B factors in that PDB correspond to the per-atom pLDDTs, just like at EBI's database.
A file ranking_debug.json list the overall pLDDTs
for all the models tried. By default 5 models are ran. For example the
following file:
{
"plddts": {
"model_1_ptm": 83.76404249206948,
"model_2_ptm": 84.43304896342197,
"model_3_ptm": 86.97910795602691,
"model_4_ptm": 85.06833732904242,
"model_5_ptm": 84.05097568175553
},
"order": [
"model_3_ptm",
"model_4_ptm",
"model_2_ptm",
"model_5_ptm",
"model_1_ptm"
]
}shows that model_3_ptm is the top ranked model.
If you want to obtain the predicted aligned error (PAE) for the
best model, just as is shown in EBI's database, you need to extract it
from the results_model file corresponding to the highest ranked model.
In the above case that would be
result_model_3.pkl.
This is a Python pickled file. The PAE matrix is stored under the
key predicted_aligned_error. You can use a simple Python
script like this to plot it:
#!/usr/bin/env python
import pickle
import json
import sys
import matplotlib.pyplot as plt
rank = json.load(open( 'ranking_debug.json', "rb" ))
top_model = rank['order'][0]
result = pickle.load(open( 'result_'+top_model+'.pkl', "rb" ))
print(result['predicted_aligned_error'].shape)
plt.imshow(result['predicted_aligned_error'],cmap='Greens_r',vmin=0,vmax=34)
plt.title('Predicted Aligned Error')
plt.xlabel('Scored residue')
plt.ylabel('Aligned residue')
cbar = plt.colorbar()
cbar.set_label('Expected position error (Ångströms)')
plt.show()You must use a pTM model (the default) to have PAE values in the result.
Make sure python binaries end up in the scratch filesystem. Otherwise they will be placed in the home directory of the user that creates the environment:
export UV_PYTHON_INSTALL_DIR=/scratch/burst/uv
mkdir -p /scratch/burst/alphafold-3
uv venv --python 3.11 /scratch/burst/alphafold-3/
source /scratch/burst/alphafold-3/bin/activate
uv pip install pip
# Not part of Dockerbuild but necessary
uv pip install setuptools
Install hmmer:
mkdir /scratch/burst/alphafold-3/src
cd /scratch/burst/alphafold-3/src
mkdir hmmer_build hmmer
wget http://eddylab.org/software/hmmer/hmmer-3.4.tar.gz --directory-prefix /scratch/burst/alphafold-3/src/hmmer_build
(cd hmmer_build && tar zxf hmmer-3.4.tar.gz && rm -f hmmer-3.4.tar.gz)
(cd hmmer_build/hmmer-3.4 && ./configure --prefix /scratch/burst/alphafold-3/)
(cd hmmer_build/hmmer-3.4 && make -j)
(cd hmmer_build/hmmer-3.4 && make install)
(cd hmmer_build/hmmer-3.4/easel && make install)
git clone https://github.com/google-deepmind/alphafold3.git
cd alphafold3
pip3 install -r dev-requirements.txt
pip3 install --no-deps .
build_data
mkdir /scratch/burst/alphafold-3/share/alphafold3
yum install -y zstd
bash /scratch/burst/alphafold-3/src/alphafold3/fetch_databases.sh /scratch/burst/alphafold-3/share/alphafold3/public_databases
Fix permissions on database:
chmod 755 /scratch/burst/alphafold-3/share/alphafold3/public_databases/mmcif_files
chmod -R +r /scratch/burst/alphafold-3/share/alphafold3/public_databases/mmcif_files
Copy over the model:
mkdir /scratch/burst/alphafold-3/share/alphafold3/models
mv <model.zstd> /scratch/burst/alphafold-3/share/alphafold3/models
cd /scratch/burst/alphafold-3/share/alphafold3/models
unzstd <model.zstd>
This part is not part of the docker. Just move the running scripts into the path and fix some paths:
cp /scratch/burst/alphafold-3/src/alphafold3/run_alphafold.py /scratch/burst/alphafold-3/bin/
Point to the place where we put the data instead ot the default /root:
sed -i "s|_HOME_DIR = pathlib\.Path(os\.environ\.get('HOME'))|_HOME_DIR = pathlib.Path('/scratch/burst/alphafold-3/share/alphafold3/')|" /scratch/burst/alphafold-3/bin/run_alphafold.py
cp /scratch/burst/alphafold-3/src/alphafold3/run_alphafold_test.py /scratch/burst/alphafold-3/bin/
Make the scripts executable and add a shebang:
chmod +x /scratch/burst/alphafold-3/bin/run_alphafold.py
chmod +x /scratch/burst/alphafold-3/bin/run_alphafold_test.py
sed -i '1s/^/\#\!\/usr\/bin\/env python\n/' /scratch/burst/alphafold-3/bin/run_alphafold.py
sed -i '1s/^/\#\!\/usr\/bin\/env python\n/' /scratch/burst/alphafold-3/bin/run_alphafold_test.py