AlphaFold

From UFRC
Jump to navigation Jump to search

Description

alphafold website  

This package provides an implementation of the inference pipeline of AlphaFold v2.3. This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.

Environment Modules

Run module spider alphafold to find out what environment modules are available for this application.


Additional Information

Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below.

Expand this section to view sample script for version 2.1.2.

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=ai
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=48gb
#SBATCH --time=12:00:00
date;hostname;pwd

run_alphafold.py \
    --data_dir "${HPC_ALPHAFOLD_REF}" \
    --output_dir $(pwd) \
    --fasta_paths query.fasta \
    --uniref90_database_path=${HPC_ALPHAFOLD_REF}/uniref90/uniref90.fasta \
    --mgnify_database_path=${HPC_ALPHAFOLD_REF}/mgnify/mgy_clusters_2018_12.fa \
    --template_mmcif_dir=${HPC_ALPHAFOLD_REF}/pdb_mmcif/mmcif_files \
    --max_template_date=2020-05-14 \
    --obsolete_pdbs_path=${HPC_ALPHAFOLD_REF}/pdb_mmcif/obsolete.dat \
    --use_gpu_relax=1 \
    --bfd_database_path=${HPC_ALPHAFOLD_REF}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniclust30_database_path=${HPC_ALPHAFOLD_REF}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --pdb70_database_path=${HPC_ALPHAFOLD_REF}/pdb70/pdb70

date

Expand this section to view sample script for version 2.3.1.

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=300gb
#SBATCH --time=96:00:00
date;hostname;pwd

module load alphafold

alphafold_full_db.sh  --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1
date

Usage Example

To simplify the usage use the 'alphafold_full_db.sh' script. Simple run example:

alphafold_full_db.sh  --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1

From version 2.3, the AlphaFold documentation recommends running as Docker container. However, Docker is not compatible with the HPC. AlphaFold has been installed as an apptainer container and the alphafold_full_db.sh wrapper script has been created to mimic the behavior of docker/run_docker.py as referenced in the AlphaFold documentation. alphafold_full_db,sh will specify the database location options required by alphafold.

To specify these options manually, use run_alphafold.sh instead.


Citation

If you publish research that uses alphafold you have to cite it as follows:

Expand this section to view citation instructions.

@Article{AlphaFold2021,
 author  = Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf 
and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, 
Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and 
Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and 
Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol 
and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis},
  journal = {Nature},
  title   = {Highly accurate protein structure prediction with {AlphaFold}},
  year    = {2021},
  doi     = {10.1038/s41586-021-03819-2},
  note    = {(Accelerated article preview)}, 
https://www.nature.com/articles/s41586-021-03819-2