Difference between revisions of "BUSCO"
Moskalenko (talk | contribs) |
|||
(9 intermediate revisions by 2 users not shown) | |||
Line 18: | Line 18: | ||
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | ||
− | Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs | + | BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs |
<!--Modules--> | <!--Modules--> | ||
− | == | + | ==Environment Modules== |
+ | Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==System Variables== | ==System Variables== | ||
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory | * HPC_{{uc:{{#var:app}}}}_DIR - installation directory | ||
Line 45: | Line 35: | ||
Busco uses a config file which needs to be copied and modified to your needs. | Busco uses a config file which needs to be copied and modified to your needs. | ||
− | $ cp $HPC_BUSCO_CONF/config.ini | + | $ cp $HPC_BUSCO_CONF/config.ini . |
− | $ export BUSCO_CONFIG_FILE= | + | $ export BUSCO_CONFIG_FILE=$(pwd)/config.ini |
− | $ busco -i ... <other arguments> | + | $ busco -f -i ... <other arguments> |
+ | |||
+ | If you don't need to modify the config file you can use the installed copy: | ||
+ | $ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments> | ||
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments] | [https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments] | ||
− | -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode. | + | *-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode. |
− | + | *-o or --out defines the folder that will contain all results, logs, and intermediate data | |
− | -o or --out defines the folder that will contain all results, logs, and intermediate data | + | *-m or --mode sets the assessment MODE: genome, proteins, transcriptome |
− | + | *-l or --lineage_dataset | |
− | -m or --mode sets the assessment MODE: genome, proteins, transcriptome | ||
− | |||
− | -l or --lineage_dataset | ||
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path. | Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path. | ||
− | + | '''Available datasets:''' | |
− | Available datasets: | + | <div style="column-count:3"> |
*arthropoda | *arthropoda | ||
*bacteria | *bacteria | ||
Line 69: | Line 59: | ||
*metazoa | *metazoa | ||
*vertebrata | *vertebrata | ||
+ | </div> | ||
and many more. Run the following command to see all available species: | and many more. Run the following command to see all available species: | ||
$ ls /data/reference/busco/VERSION | $ ls /data/reference/busco/VERSION | ||
− | + | #for example: $ ls /data/reference/busco/v5 | |
− | $ ls /data/reference/busco/v5 | ||
Example of busco run with metazoa dataset: | Example of busco run with metazoa dataset: | ||
Line 79: | Line 69: | ||
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page. | To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page. | ||
− | ; | + | <div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;"> |
− | + | ''Expand this section to view an example, copying aspergillus nidulans.'' | |
+ | <div class="mw-collapsible-content" style="padding: 5px;"> | ||
Let's copy aspergillus_nidulans | Let's copy aspergillus_nidulans | ||
mkdir -p augustus/species | mkdir -p augustus/species | ||
− | + | # Load the busco module and copy augustus data | |
− | + | #*<pre>cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/</pre> | |
− | + | # Copy the models | |
− | + | #*<pre> cp $AUGUSTUS_CONFIG_PATH/model augustus/</pre> | |
− | + | #Add this to the busco job script and submit it. | |
− | Add | + | #*<pre>export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus</pre></div></div> |
− | |||
− | |||
|}} | |}} | ||
<!--PBS scripts--> | <!--PBS scripts--> |
Latest revision as of 19:38, 27 March 2023
Description
BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
Environment Modules
Run module spider busco
to find out what environment modules are available for this application.
System Variables
- HPC_BUSCO_DIR - installation directory
Additional Information
Busco uses a config file which needs to be copied and modified to your needs.
$ cp $HPC_BUSCO_CONF/config.ini . $ export BUSCO_CONFIG_FILE=$(pwd)/config.ini $ busco -f -i ... <other arguments>
If you don't need to modify the config file you can use the installed copy:
$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>
- -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
- -o or --out defines the folder that will contain all results, logs, and intermediate data
- -m or --mode sets the assessment MODE: genome, proteins, transcriptome
- -l or --lineage_dataset
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
Available datasets:
- arthropoda
- bacteria
- eukaryota
- fungi
- metazoa
- vertebrata
and many more. Run the following command to see all available species:
$ ls /data/reference/busco/VERSION #for example: $ ls /data/reference/busco/v5
Example of busco run with metazoa dataset:
busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.
Expand this section to view an example, copying aspergillus nidulans.
Let's copy aspergillus_nidulans
mkdir -p augustus/species
- Load the busco module and copy augustus data
cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/
- Copy the models
cp $AUGUSTUS_CONFIG_PATH/model augustus/
- Add this to the busco job script and submit it.
export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus
Citation
If you publish research that uses busco you have to cite it as follows:
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015