Difference between revisions of "AlphaFold"
Moskalenko (talk | contribs) |
Moskalenko (talk | contribs) |
||
Line 35: | Line 35: | ||
Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below. | Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below. | ||
+ | |||
+ | '''Version 2.1.2''' | ||
+ | <pre> | ||
+ | run_alphafold.py \ | ||
+ | --data_dir "${HPC_ALPHAFOLD_REF}" \ | ||
+ | --output_dir $(pwd) \ | ||
+ | --fasta_paths query.fasta \ | ||
+ | --uniref90_database_path=${HPC_ALPHAFOLD_REF}/uniref90/uniref90.fasta \ | ||
+ | --mgnify_database_path=${HPC_ALPHAFOLD_REF}/mgnify/mgy_clusters_2018_12.fa \ | ||
+ | --template_mmcif_dir=${HPC_ALPHAFOLD_REF}/pdb_mmcif/mmcif_files \ | ||
+ | --max_template_date=2020-05-14 \ | ||
+ | --obsolete_pdbs_path=${HPC_ALPHAFOLD_REF}/pdb_mmcif/obsolete.dat \ | ||
+ | --use_gpu_relax=1 \ | ||
+ | --bfd_database_path=${HPC_ALPHAFOLD_REF}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ | ||
+ | --uniclust30_database_path=${HPC_ALPHAFOLD_REF}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ | ||
+ | --pdb70_database_path=${HPC_ALPHAFOLD_REF}/pdb70/pdb70 | ||
+ | </pre> | ||
<pre> | <pre> | ||
#SBATCH --partition=gpu | #SBATCH --partition=gpu | ||
Line 47: | Line 64: | ||
module load alphafold | module load alphafold | ||
+ | '''Version 2.0.0''' | ||
run_alphafold.sh -d $HPC_ALPHAFOLD_REF \ | run_alphafold.sh -d $HPC_ALPHAFOLD_REF \ | ||
-o test/ -m model_1 \ | -o test/ -m model_1 \ |
Revision as of 20:47, 11 February 2022
Description
This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP14 and published in Nature. For simplicity, we refer to this model as AlphaFold throughout the rest of this document.
Environment Modules
Run module spider alphafold
to find out what environment modules are available for this application.
System Variables
- HPC_ALPHAFOLD_DIR - installation directory
- HPC_ALPHAFOLD_BIN - executable directory
Additional Information
Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below.
Version 2.1.2
run_alphafold.py \ --data_dir "${HPC_ALPHAFOLD_REF}" \ --output_dir $(pwd) \ --fasta_paths query.fasta \ --uniref90_database_path=${HPC_ALPHAFOLD_REF}/uniref90/uniref90.fasta \ --mgnify_database_path=${HPC_ALPHAFOLD_REF}/mgnify/mgy_clusters_2018_12.fa \ --template_mmcif_dir=${HPC_ALPHAFOLD_REF}/pdb_mmcif/mmcif_files \ --max_template_date=2020-05-14 \ --obsolete_pdbs_path=${HPC_ALPHAFOLD_REF}/pdb_mmcif/obsolete.dat \ --use_gpu_relax=1 \ --bfd_database_path=${HPC_ALPHAFOLD_REF}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniclust30_database_path=${HPC_ALPHAFOLD_REF}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --pdb70_database_path=${HPC_ALPHAFOLD_REF}/pdb70/pdb70
#SBATCH --partition=gpu #SBATCH --constraint=a100 #SBATCH --nodes=1 #SBATCH --ntasks=8 #SBATCH --gpus=1 #SBATCH --mem=300gb #SBATCH --time=96:00:00 date;hostname;pwd module load alphafold '''Version 2.0.0''' run_alphafold.sh -d $HPC_ALPHAFOLD_REF \ -o test/ -m model_1 \ -f /apps/alphafold/2.0.0/alphafold/example/query.fasta \ -t 2020-05-14 date
Usage Example
To simplify the usage use the 'run_alphafold.sh' script. Simple run example:
run_alphafold.sh -o test/ -m model_1 -f query.fasta -t 2020-05-14
By default run_alphafold.sh will use the 2.2 TB of pre-downloaded reference data in $HPC_ALPHAFOLD_REF.
To access all options use the run_alphafold.py script.
Citation
If you publish research that uses alphafold you have to cite it as follows:
@Article{AlphaFold2021, author = Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z{\'\i}dek, Augustin and Potapenko, Anna and Bridgland, Alex and Meyer, Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie, Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig and Reiman, David and Clancy, Ellen and Zielinski, Michal and Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet and Hassabis, Demis}, journal = {Nature}, title = {Highly accurate protein structure prediction with {AlphaFold}}, year = {2021}, doi = {10.1038/s41586-021-03819-2}, note = {(Accelerated article preview)},
https://www.nature.com/articles/s41586-021-03819-2