BUSCO: Difference between revisions

From UFRC
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 44: Line 44:
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments]
[https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments]


-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
*-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
 
*-o or --out defines the folder that will contain all results, logs, and intermediate data
-o or --out defines the folder that will contain all results, logs, and intermediate data
*-m or --mode sets the assessment MODE: genome, proteins, transcriptome
 
*-l or --lineage_dataset
-m or --mode sets the assessment MODE: genome, proteins, transcriptome
 
-l or --lineage_dataset


Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.


 
'''Available datasets:'''
Available datasets:
<div style="column-count:3">
<div style="column-count:3">
*arthropoda
*arthropoda
Line 73: Line 69:
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page.
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the [[Augustus]] page.


;Example:
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 
''Expand this section to view an example, copying aspergillus nidulans.''
<div class="mw-collapsible-content" style="padding: 5px;">
Let's copy aspergillus_nidulans
Let's copy aspergillus_nidulans


Line 84: Line 81:
#*<pre> cp $AUGUSTUS_CONFIG_PATH/model augustus/</pre>
#*<pre> cp $AUGUSTUS_CONFIG_PATH/model augustus/</pre>
#Add this to the busco job script and submit it.
#Add this to the busco job script and submit it.
#*<pre>export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus</pre>
#*<pre>export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus</pre></div></div>
 
|}}
|}}
<!--PBS scripts-->
<!--PBS scripts-->

Latest revision as of 19:38, 27 March 2023

Description

busco website  

BUSCO stands for Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

Environment Modules

Run module spider busco to find out what environment modules are available for this application.

System Variables

  • HPC_BUSCO_DIR - installation directory

Additional Information

Busco uses a config file which needs to be copied and modified to your needs.

$ cp $HPC_BUSCO_CONF/config.ini .
$ export BUSCO_CONFIG_FILE=$(pwd)/config.ini
$ busco -f -i ... <other arguments>

If you don't need to modify the config file you can use the installed copy:

$ busco -f --config ${HPC_BUSCO_CONF}/config.ini -i ... <other arguments>

Mandatory arguments

  • -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
  • -o or --out defines the folder that will contain all results, logs, and intermediate data
  • -m or --mode sets the assessment MODE: genome, proteins, transcriptome
  • -l or --lineage_dataset

Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.

Available datasets:

  • arthropoda
  • bacteria
  • eukaryota
  • fungi
  • metazoa
  • vertebrata

and many more. Run the following command to see all available species:

$ ls /data/reference/busco/VERSION
#for example: $ ls /data/reference/busco/v5

Example of busco run with metazoa dataset:

busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome

To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.

Expand this section to view an example, copying aspergillus nidulans.



Citation

If you publish research that uses busco you have to cite it as follows:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015