Virclust
Description
VirClust is a bioinformatics tool which can be used for:
- virus clustering
- protein annotation
- core protein calculation
At its core is the grouping of viral proteins into clusters of three different levels:
- at the first level, proteins are grouped based on their reciprocal BLASTP similarities into protein clusters, or PCs.
- at the second level, PCs are grouped based on their Hidden Markov Model (HMM) similarities into protein superclusters, or PSCs.
- at the third level, PSCs are grouped based on their HMM similarities into protein super-superclusters, or PSSC.
Environment Modules
Run module spider virclust
to find out what environment modules are available for this application.
System Variables
- HPC_VIRCLUST_DIR - installation directory
- HPC_VIRCLUST_BIN - executable directory
- HPC_VIRCLUST_BLASTDB - blast db directory
- HPC_VIRCLUST_IPRSCANDB - InterProScan directory and database
- HPC_VIRCLUST_DB - virclust db directory for Efam, Efam-XC, PHROGS, pVOGs, and VOGDB databases
Additional Information
To run Virclust, use the following command: Rscript $HPC_VIRCLUST_BIN/VirClust_MASTER.R sing=conda condaenvpath=$HPC_VIRCLUST_DIR [...options]
interproscan=$HPC_VIRCLUST_IPRSCANDB #when annotating against InterProScan db
blastdb=$HPC_VIRCLUST_BLASTDB #when annotating against NR blast db
databases=$HPC_VIRCLUST_DB #when annotating against other db
Citation
If you publish research that uses virclust you have to cite it as follows:
Moraru, C. (2023) VirClus - A Tool for Hierarchical Clustering, Core Protein Detection and Annotation of (Prokaryotic) Viruses, Viruses 15(4), pp 1007, doi: https://doi.org/10.3390/v15041007