Clean
Description
CLEAN, Contrastive Learning enabled Enzyme ANnotation, is a machine learning algorithm to assign Enzyme Commission (EC) number with better accuracy, reliability, and sensitivity than all existing computational tools.
Environment Modules
Run module spider clean
to find out what environment modules are available for this application.
System Variables
- HPC_CLEAN_DIR - installation directory
- HPC_CLEAN_BIN - executable directory
Additional Information
When loading the module CLEAN, the environment is loaded with necessary dependencies to run CLEAN_infer_fasta.py. Users should still clone the CLEAN repo locally from https://github.com/tttianhao/CLEAN
to their work directory. Once cloned, the user should also run git clone https://github.com/facebookresearch/esm.git; mkdir data/esm_data
inside the CLEAN repo directory before running the command python CLEAN_infer_fasta.py --fasta_data price
for the first time.
To work with FASTA files, download the provided files from the repo and move them to data/pretrained.
Citation
If you publish research that uses clean you have to cite it as follows:
Tianhao Yu and Haiyang Cui and Jianan Canal Li and Yunan Luo and Guangde Jiang and Huimin Zhao. Enzyme function prediction using contrastive learning. Science. 379. 6639. 1358-1363. 2023. 10.1126/science.adf2465. https://www.science.org/doi/abs/10.1126/science.adf2465.