Difference between revisions of "MetaBCC-LR"

From UFRC
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Category:Software]]
 
[[Category:Software]]
 
[[Category:Biology]]
 
[[Category:Biology]]
[[Category:Metagenomics]]
+
[[Category:genomics]]
[[Category:Binning]]
 
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|MetaBCC-LR}}
 
|{{#vardefine:app|MetaBCC-LR}}
Line 41: Line 40:
 
<!--Job Scripts-->
 
<!--Job Scripts-->
 
{{#if: {{#var: job}}|==Job Script Examples==
 
{{#if: {{#var: job}}|==Job Script Examples==
See the [[{{PAGENAME}}_Job_Scripts]] page for {{#var: app}} Job script examples.
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand this view sample installation test script.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 +
<pre>
 +
#!/bin/bash
 +
#SBATCH --job-name=metabcc_lr_test
 +
#SBATCH --mail-type=NONE
 +
#SBATCH --nodes=1
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=32
 +
#SBATCH --mem-per-cpu=4gb
 +
#SBATCH --time=24:00:00
 +
#SBATCH --output=metabcc_lr_test.log
 +
 
 +
echo "Setting up test environment..."
 +
TEST_PWD=/data/apps/tests/metabcc_lr/2.0.0
 +
TEST_DATADIR=${TEST_PWD}/example_data
 +
TEST_WORKDIR=${TEST_PWD}/test_output
 +
 
 +
cd ${TEST_PWD}
 +
module load metabcc_lr/2.0.0
 +
 
 +
# Remove any previous test results and re-create a working directory
 +
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
 +
mkdir ${TEST_WORKDIR}
 +
 
 +
echo "Starting test run at $(date) on $(hostname)..."
 +
 
 +
# Based on https://github.com/anuradhawick/MetaBCC-LR#test-run-data
 +
###################################
 +
mbcclr \
 +
    -r ${TEST_DATADIR}/reads.fasta \
 +
    -g ${TEST_DATADIR}/ids.txt \
 +
    -o ${TEST_WORKDIR} \
 +
    -e umap \
 +
    -c 25000 \
 +
    -bs 10 \
 +
    -bc 10 \
 +
    -k 4 \
 +
    -t ${SLURM_JOB_CPUS_PER_NODE:-4}
 +
 
 +
reads2bins.py \
 +
    --reads ${TEST_DATADIR}/reads.fasta \
 +
    --bins ${TEST_WORKDIR}/final.txt \
 +
    --output ${TEST_WORKDIR}/final_bins
 +
 
 +
###################################
 +
 
 +
# There should be some binned fasta files in the work directory
 +
echo "There should be some results listed below:"
 +
find ${TEST_WORKDIR}/final_bins -name '*.fasta' -type f ! -empty -ls
 +
 
 +
echo "Test complete at $(date)."
 +
</pre>
 +
</div>
 +
</div>
 
|}}
 
|}}
 
<!--Policy-->
 
<!--Policy-->
Line 61: Line 115:
 
{{#if: {{#var: citation}}|==Citation==
 
{{#if: {{#var: citation}}|==Citation==
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
 
If you publish research that uses {{#var:app}} you have to cite it as follows:
 
+
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand to view citation instructions.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 
<pre>
 
<pre>
 
@article{10.1093/bioinformatics/btaa441,
 
@article{10.1093/bioinformatics/btaa441,
Line 79: Line 135:
 
}
 
}
 
</pre>
 
</pre>
 
+
</div>
 +
</div>
 
|}}
 
|}}
 
<!--Installation-->
 
<!--Installation-->

Latest revision as of 14:28, 15 December 2022

Description

MetaBCC-LR website  

Reference-free Binning of Metagenomics Long Reads using Coverage and Composition

Environment Modules

Run module spider MetaBCC-LR to find out what environment modules are available for this application.

System Variables

  • HPC_METABCC-LR_DIR - installation directory
  • HPC_METABCC-LR_BIN - executable directory


Job Script Examples

Expand this view sample installation test script.

#!/bin/bash
#SBATCH --job-name=metabcc_lr_test
#SBATCH --mail-type=NONE
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=metabcc_lr_test.log

echo "Setting up test environment..."
TEST_PWD=/data/apps/tests/metabcc_lr/2.0.0
TEST_DATADIR=${TEST_PWD}/example_data
TEST_WORKDIR=${TEST_PWD}/test_output

cd ${TEST_PWD}
module load metabcc_lr/2.0.0

# Remove any previous test results and re-create a working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
mkdir ${TEST_WORKDIR}

echo "Starting test run at $(date) on $(hostname)..."

# Based on https://github.com/anuradhawick/MetaBCC-LR#test-run-data
###################################
mbcclr \
    -r ${TEST_DATADIR}/reads.fasta \
    -g ${TEST_DATADIR}/ids.txt \
    -o ${TEST_WORKDIR} \
    -e umap \
    -c 25000 \
    -bs 10 \
    -bc 10 \
    -k 4 \
    -t ${SLURM_JOB_CPUS_PER_NODE:-4}

reads2bins.py \
    --reads ${TEST_DATADIR}/reads.fasta \
    --bins ${TEST_WORKDIR}/final.txt \
    --output ${TEST_WORKDIR}/final_bins

###################################

# There should be some binned fasta files in the work directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR}/final_bins -name '*.fasta' -type f ! -empty -ls

echo "Test complete at $(date)."


Citation

If you publish research that uses MetaBCC-LR you have to cite it as follows:

Expand to view citation instructions.

@article{10.1093/bioinformatics/btaa441,
    author = {Wickramarachchi, Anuradha and Mallawaarachchi, Vijini and Rajan, Vaibhav and Lin, Yu},
    title = "{MetaBCC-LR: metagenomics binning by coverage and composition for long reads}",
    journal = {Bioinformatics},
    volume = {36},
    number = {Supplement_1},
    pages = {i3-i11},
    year = {2020},
    month = {07},
    abstract = "{Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition.We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13\\% improvement in F1-score and ∼30\\% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications.The source code is freely available at: https://github.com/anuradhawick/MetaBCC-LR.Supplementary data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa441},
    url = {https://doi.org/10.1093/bioinformatics/btaa441},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/Supplement\_1/i3/33488763/btaa441.pdf},
}