Difference between revisions of "BUSCO"

From UFRC
Jump to navigation Jump to search
m (Text replacement - "/ufrc/data/reference" to "/data/reference")
(4 intermediate revisions by the same user not shown)
Line 18: Line 18:
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
 
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}}
  
Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
+
BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
  
 
<!--Modules-->
 
<!--Modules-->
==Required Modules==
+
==Environment Modules==
 +
Run <code>module spider {{#var:app}}</code> to find out what environment modules are available for this application.
  
===Serial===
 
* {{#var:app}}
 
<!--
 
===Parallel (OpenMP)===
 
* intel
 
* {{#var:app}}
 
===Parallel (MPI)===
 
* intel
 
* openmpi
 
* {{#var:app}}
 
-->
 
 
==System Variables==
 
==System Variables==
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
 
* HPC_{{uc:{{#var:app}}}}_DIR - installation directory
Line 45: Line 35:
 
Busco uses a config file which needs to be copied and modified to your needs.  
 
Busco uses a config file which needs to be copied and modified to your needs.  
  
  $ cp $HPC_BUSCO_CONF/config.ini /home/username/config.ini
+
  $ cp $HPC_BUSCO_CONF/config.ini .
  $ export BUSCO_CONFIG_FILE=/home/username/config.ini
+
  $ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
  $ run_BUSCO.py
+
  $ busco -f -i ... <other arguments>
  
 +
If you don't need to modify the config file you can use the installed copy:
 +
$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>
  
Datasets are located in /data/reference/busco/
+
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments]
 +
 
 +
-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
 +
 
 +
-o or --out defines the folder that will contain all results, logs, and intermediate data
 +
 
 +
-m or --mode sets the assessment MODE: genome, proteins, transcriptome
 +
 
 +
-l or --lineage_dataset
 +
 
 +
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
  
  
Line 61: Line 63:
 
*vertebrata  
 
*vertebrata  
 
and many more. Run the following command to see all available species:
 
and many more. Run the following command to see all available species:
  $ ls /data/reference/busco/
+
  $ ls /data/reference/busco/VERSION
 +
e.g.
 +
$ ls /data/reference/busco/v5
  
 
Example of busco run with metazoa dataset:
 
Example of busco run with metazoa dataset:

Revision as of 17:34, 30 June 2021

Description

busco website  

BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

Environment Modules

Run module spider busco to find out what environment modules are available for this application.

System Variables

  • HPC_BUSCO_DIR - installation directory

Additional Information

Busco uses a config file which needs to be copied and modified to your needs.

$ cp $HPC_BUSCO_CONF/config.ini .
$ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
$ busco -f -i ... <other arguments>

If you don't need to modify the config file you can use the installed copy:

$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>

Mandatory arguments

-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.

-o or --out defines the folder that will contain all results, logs, and intermediate data

-m or --mode sets the assessment MODE: genome, proteins, transcriptome

-l or --lineage_dataset

Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.


Available datasets:

  • arthropoda
  • bacteria
  • eukaryota
  • fungi
  • metazoa
  • vertebrata

and many more. Run the following command to see all available species:

$ ls /data/reference/busco/VERSION

e.g.

$ ls /data/reference/busco/v5

Example of busco run with metazoa dataset:

busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome

To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.

Example

Let's copy aspergillus_nidulans

mkdir -p augustus/species

  • Load the busco module and copy augustus data
cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/
  • Copy the models
cp $AUGUSTUS_CONFIG_PATH/model augustus/

Add

export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus

to the busco job script and submit it.



Citation

If you publish research that uses busco you have to cite it as follows:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015