BUSCO: Difference between revisions

From UFRC
Jump to navigation Jump to search
Maxprok (talk | contribs)
Created page with "Category:SoftwareCategory:BiologyCategory:Genomics {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|busco}} |{{#vardefine:url|http://busco.ezlab.org/}} <!--CONFI..."
 
No edit summary
 
(24 intermediate revisions by 3 users not shown)
Line 18: Line 18:
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}


Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs


<!--Modules-->
<!--Modules-->
==Required Modules==
==Environment Modules==
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.


===Serial===
* {{#var:app}}
<!--
===Parallel (OpenMP)===
* intel
* {{#var:app}}
===Parallel (MPI)===
* intel
* openmpi
* {{#var:app}}
-->
==System Variables==
==System Variables==
* HPC_{{#uppercase:{{#var:app}}}}_DIR - installation directory
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
<!--Configuration-->
<!--Configuration-->
{{#if: {{#var: conf}}|==Configuration==
{{#if: {{#var: conf}}|==Configuration==
Line 43: Line 33:
{{#if: {{#var: exe}}|==Additional Information==
{{#if: {{#var: exe}}|==Additional Information==


Datasets are located in /scratch/lfs/bio/reference/busco/
Busco uses a config file which needs to be copied and modified to your needs.


$ cp $HPC_BUSCO_CONF/config.ini .
$ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
$ busco -f -i ... <other arguments>


Available datasets:
If you don't need to modify the config file you can use the installed copy:
$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>
 
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments]
 
*-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
*-o or --out defines the folder that will contain all results, logs, and intermediate data
*-m or --mode sets the assessment MODE: genome, proteins, transcriptome
*-l or --lineage_dataset
 
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
 
'''Available datasets:'''
<div style="column-count:3">
*arthropoda
*arthropoda
*bacteria
*bacteria
Line 53: Line 59:
*metazoa
*metazoa
*vertebrata  
*vertebrata  
</div>
and many more. Run the following command to see all available species:
$ ls /data/reference/busco/VERSION
#for example: $ ls /data/reference/busco/v5


Example of busco run with metazoa dataset:
busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome


Example of busco run with metazoa dataset:
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page.
busco -f -in target.fa -o SAMPLE -l metazoa -m genome
 
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
''Expand this section to view an example, copying aspergillus nidulans.''
<div class="mw-collapsible-content" style="padding: 5px;">
Let's copy aspergillus_nidulans
 
mkdir -p augustus/species
 
# Load the busco module and copy augustus data
#*<pre>cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/</pre>
# Copy the models
#*<pre> cp $AUGUSTUS_CONFIG_PATH/model augustus/</pre>
#Add this to the busco job script and submit it.
#*<pre>export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus</pre></div></div>
|}}
|}}
<!--PBS scripts-->
<!--PBS scripts-->

Latest revision as of 19:38, 27 March 2023

Description

busco website  

BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

Environment Modules

Run module spider busco to find out what environment modules are available for this application.

System Variables

  • HPC_BUSCO_DIR - installation directory

Additional Information

Busco uses a config file which needs to be copied and modified to your needs.

$ cp $HPC_BUSCO_CONF/config.ini .
$ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
$ busco -f -i ... <other arguments>

If you don't need to modify the config file you can use the installed copy:

$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>

Mandatory arguments

  • -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
  • -o or --out defines the folder that will contain all results, logs, and intermediate data
  • -m or --mode sets the assessment MODE: genome, proteins, transcriptome
  • -l or --lineage_dataset

Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.

Available datasets:

  • arthropoda
  • bacteria
  • eukaryota
  • fungi
  • metazoa
  • vertebrata

and many more. Run the following command to see all available species:

$ ls /data/reference/busco/VERSION
#for example: $ ls /data/reference/busco/v5

Example of busco run with metazoa dataset:

busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome

To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.

Expand this section to view an example, copying aspergillus nidulans.

Let's copy aspergillus_nidulans

mkdir -p augustus/species

  1. Load the busco module and copy augustus data
    • cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/
  2. Copy the models
    •  cp $AUGUSTUS_CONFIG_PATH/model augustus/
  3. Add this to the busco job script and submit it.
    • export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus



Citation

If you publish research that uses busco you have to cite it as follows:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015