BUSCO: Difference between revisions

From UFRC
Jump to navigation Jump to search
Maxprok (talk | contribs)
No edit summary
No edit summary
 
(14 intermediate revisions by 3 users not shown)
Line 18: Line 18:
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}


Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs


<!--Modules-->
<!--Modules-->
==Required Modules==
==Environment Modules==
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.


===Serial===
* {{#var:app}}
<!--
===Parallel (OpenMP)===
* intel
* {{#var:app}}
===Parallel (MPI)===
* intel
* openmpi
* {{#var:app}}
-->
==System Variables==
==System Variables==
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
<!--Configuration-->
<!--Configuration-->
{{#if: {{#var: conf}}|==Configuration==
{{#if: {{#var: conf}}|==Configuration==
Line 45: Line 35:
Busco uses a config file which needs to be copied and modified to your needs.  
Busco uses a config file which needs to be copied and modified to your needs.  


  $ cp $HPC_BUSCO_CONF/config.ini /home/username/busco
  $ cp $HPC_BUSCO_CONF/config.ini .
  $ export BUSCO_CONFIG_FILE=/home/username/busco/config.ini
  $ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
  $ run_BUSCO.py
  $ busco -f -i ... <other arguments>


If you don't need to modify the config file you can use the installed copy:
$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>


Datasets are located in /ufrc/data/reference/busco/
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments]


*-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
*-o or --out defines the folder that will contain all results, logs, and intermediate data
*-m or --mode sets the assessment MODE: genome, proteins, transcriptome
*-l or --lineage_dataset


Available datasets:
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
 
'''Available datasets:'''
<div style="column-count:3">
*arthropoda
*arthropoda
*bacteria
*bacteria
Line 60: Line 59:
*metazoa
*metazoa
*vertebrata  
*vertebrata  
</div>
and many more. Run the following command to see all available species:
and many more. Run the following command to see all available species:
  $ ls /ufrc/data/reference/busco/
  $ ls /data/reference/busco/VERSION
#for example: $ ls /data/reference/busco/v5


Example of busco run with metazoa dataset:
Example of busco run with metazoa dataset:
Line 68: Line 69:
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page.
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page.


;Example:
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 
''Expand this section to view an example, copying aspergillus nidulans.''
<div class="mw-collapsible-content" style="padding: 5px;">
Let's copy aspergillus_nidulans
Let's copy aspergillus_nidulans


mkdir -p augustus/species
mkdir -p augustus/species


* Load the busco module and copy augustus data
# Load the busco module and copy augustus data
cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/
#*<pre>cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/</pre>
* Copy the models
# Copy the models
cp $AUGUSTUS_CONFIG_PATH/model augustus/
#*<pre> cp $AUGUSTUS_CONFIG_PATH/model augustus/</pre>
 
#Add this to the busco job script and submit it.
Add
#*<pre>export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus</pre></div></div>
export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus
to the busco job script and submit it.
|}}
|}}
<!--PBS scripts-->
<!--PBS scripts-->

Latest revision as of 19:38, 27 March 2023

Description

busco website  

BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

Environment Modules

Run module spider busco to find out what environment modules are available for this application.

System Variables

  • HPC_BUSCO_DIR - installation directory

Additional Information

Busco uses a config file which needs to be copied and modified to your needs.

$ cp $HPC_BUSCO_CONF/config.ini .
$ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
$ busco -f -i ... <other arguments>

If you don't need to modify the config file you can use the installed copy:

$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>

Mandatory arguments

  • -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
  • -o or --out defines the folder that will contain all results, logs, and intermediate data
  • -m or --mode sets the assessment MODE: genome, proteins, transcriptome
  • -l or --lineage_dataset

Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.

Available datasets:

  • arthropoda
  • bacteria
  • eukaryota
  • fungi
  • metazoa
  • vertebrata

and many more. Run the following command to see all available species:

$ ls /data/reference/busco/VERSION
#for example: $ ls /data/reference/busco/v5

Example of busco run with metazoa dataset:

busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome

To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.

Expand this section to view an example, copying aspergillus nidulans.



Citation

If you publish research that uses busco you have to cite it as follows:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015