Dbcanlight
Description
dbcanlight is lightweight rewrite of run_dbcan for better multithreading performance. Unlike the original run_dbcan, where input may contain millions of sequences that need to be split up for processing, dbcanlight can process large inputs in a batch, eliminating the need for splitting.
The main program dbcanlight comprises 3 modules - build, search and conclude. The build module help to download the required databases from dbcan website; the search module searches against protein HMM, substrate HMM or diamond databases and reports the hits separately; and the conclude module gathers all the results made by each module and provides a brief overview.
In addition to the main program, two programs are also included to help parse the hmmsearch outputs if users have done their own searches by cli HMMER3 suite. The dbcanlight-hmmparse is a rewrite of hmmscan_parser.py in run_dbcan, which can be used to filter the overlapped hits and convert a domtblout format file output from hmmer3 suite into a dbcan-10-column format. The dbcanlight-subparser takes the dbcan-formatted substrate output and map against the substrate conversion table.
The output of dbcanlight is resemble to the original run_dbcan with slight cleanup. The original run_dbcan output the same substrate several times for a gene that hits multiple profiles with the same substrate; in dbcanlight we only report it once.
Dbcanlight only re-implemented the core features of run_dbcan, that is searching for CAZyme and substrate matches by hmmer/diamond/dbcansub. Submodules like signalP, CGCFinder, etc. are not implemented. If you tend to use these features, please use the original version of run_dbcan.
Environment Modules
Run module spider dbcanlight
to find out what environment modules are available for this application.
System Variables
- HPC_DBCANLIGHT_DIR - installation directory
- HPC_DBCANLIGHT_BIN - executable directory
- DBCANLIGHT_DB - reference database directory