Difference between revisions of "BUSCO"
Moskalenko (talk | contribs) m (Text replacement - "/ufrc/data/reference" to "/data/reference") |
Moskalenko (talk | contribs) |
||
Line 47: | Line 47: | ||
$ cp $HPC_BUSCO_CONF/config.ini /home/username/config.ini | $ cp $HPC_BUSCO_CONF/config.ini /home/username/config.ini | ||
$ export BUSCO_CONFIG_FILE=/home/username/config.ini | $ export BUSCO_CONFIG_FILE=/home/username/config.ini | ||
− | $ | + | $ busco -i ... <other arguments> |
+ | [https://busco.ezlab.org/busco_userguide.html#mandatory-arguments Mandatory arguments] | ||
− | Datasets are located in /data/reference/busco/ | + | -i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode. |
+ | |||
+ | -o or --out defines the folder that will contain all results, logs, and intermediate data | ||
+ | |||
+ | -m or --mode sets the assessment MODE: genome, proteins, transcriptome | ||
+ | |||
+ | -l or --lineage_dataset | ||
+ | |||
+ | Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path. | ||
Line 61: | Line 70: | ||
*vertebrata | *vertebrata | ||
and many more. Run the following command to see all available species: | and many more. Run the following command to see all available species: | ||
− | $ ls /data/reference/busco/ | + | $ ls /data/reference/busco/VERSION |
+ | e.g. | ||
+ | $ ls /data/reference/busco/v5 | ||
Example of busco run with metazoa dataset: | Example of busco run with metazoa dataset: |
Revision as of 17:30, 30 June 2021
Description
Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs
Required Modules
Serial
- busco
System Variables
- HPC_BUSCO_DIR - installation directory
Additional Information
Busco uses a config file which needs to be copied and modified to your needs.
$ cp $HPC_BUSCO_CONF/config.ini /home/username/config.ini $ export BUSCO_CONFIG_FILE=/home/username/config.ini $ busco -i ... <other arguments>
-i or --in defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode.
-o or --out defines the folder that will contain all results, logs, and intermediate data
-m or --mode sets the assessment MODE: genome, proteins, transcriptome
-l or --lineage_dataset
Datasets are located in /data/reference/busco/VERSION. The config.ini file is already configured to use the correct path.
Available datasets:
- arthropoda
- bacteria
- eukaryota
- fungi
- metazoa
- vertebrata
and many more. Run the following command to see all available species:
$ ls /data/reference/busco/VERSION
e.g.
$ ls /data/reference/busco/v5
Example of busco run with metazoa dataset:
busco -f -in target.fa -o SAMPLE -l ${HPC_BUSCO_DAT}/metazoa -m genome
To allow busco to retrain an existing Augustus dataset create a local copy of the data and set $AUGUSTUS_CONFIG_PATH variable to that path as explained on the Augustus page.
- Example
Let's copy aspergillus_nidulans
mkdir -p augustus/species
- Load the busco module and copy augustus data
cp $AUGUSTUS_CONFIG_PATH/species/aspergillus_nidulans/ augustus/species/
- Copy the models
cp $AUGUSTUS_CONFIG_PATH/model augustus/
Add
export AUGUSTUS_CONFIG_PATH=$(pwd)/augustus
to the busco job script and submit it.
Citation
If you publish research that uses busco you have to cite it as follows:
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015