Virclust

From UFRC
Jump to navigation Jump to search

Description

virclust website  

VirClust is a bioinformatics tool which can be used for:

  • virus clustering
  • protein annotation
  • core protein calculation


At its core is the grouping of viral proteins into clusters of three different levels:

  • at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.
  • at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.
  • at the third level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.


Environment Modules

Run module spider virclust to find out what environment modules are available for this application.

System Variables

  • HPC_VIRCLUST_DIR - installation directory
  • HPC_VIRCLUST_BIN - executable directory
  • HPC_VIRCLUST_BLASTDB - blast db directory
  • HPC_VIRCLUST_IPRSCANDB - InterProScan directory and database
  • HPC_VIRCLUST_DB - virclust db directory for Efam, Efam-XC, PHROGS, pVOGs, and VOGDB databases

Additional Information

To run Virclust, use the following command: Rscript $HPC_VIRCLUST_BIN/VirClust_MASTER.R sing=conda condaenvpath=$HPC_VIRCLUST_DIR [...options]


interproscan=$HPC_VIRCLUST_IPRSCANDB #when annotating against InterProScan db

blastdb=$HPC_VIRCLUST_BLASTDB #when annotating against NR blast db

databases=$HPC_VIRCLUST_DB #when annotating against other db



Citation

If you publish research that uses virclust you have to cite it as follows:

Moraru, C. (2023) VirClus - A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (Prokaryotic) Viruses, Viruses 15(4), pp 1007, doi: https://doi.org/10.3390/v15041007