PGAP

Description

The NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids). Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs and pseudogenes. NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Pipeline was developed in 2001 and is regularly upgraded to improve structural and functional annotation quality (Li W, O'Neill KR et al 2021). Recent improvements include utilization of curated protein profile hidden Markov models (HMMs), and curated complex domain architectures for functional annotation of proteins and annotation of Enzyme Commission numbers and Gene Ontology terms.

Environment Modules

Run module spider pgap to find out what environment modules are available for this application.

System Variables

HPC_PGAP_DIR - installation directory
HPC_PGAP_BIN - executable directory

Additional Information

The PGAP module provides a wrapper function ("pgap.py") to fine tune usage in HPG. This means that you should just use "pgap.py ..." without a path when running the command. See the Job Scripts page below for an example.

Job Script Examples

See the PGAP_Job_Scripts page for PGAP Job script examples.

Citation

Please cite NCBI in any work or product based on this material.