VirSieve

From UFRC
Revision as of 18:58, 14 December 2022 by Israel.herrera (talk | contribs)
Jump to navigation Jump to search

Description

VirSieve website  

Environmental Viral Detection pipeline. Contains VirSieveAlign, VirSieveGATK, VirSieveIVar, and VirSieveVEP applications

Environment Modules

Run module spider VirSieve to find out what environment modules are available for this application.

System Variables

  • HPC_VIRSIEVE_DIR - installation directory
  • HPC_VIRSIEVE_BIN - installation directory

Additional Information

Each tool expects to be given a path to a working folder where your configurations, FASTA files, custom references, etc. are located. To aid with this, UFRC provides a "wrapper script" for each application in the pipeline that can be called with a path pointing to your working directory. For example, to call VirSieveIVar to process the working directory /blue/groupname/username/virsieve_test/:

$ module load virseive/20210406
$ VirSieveIVar /blue/groupname/username/virsieve_test/

Each tool also has the ability to set non-default options via environment variables. Since we are running these tools within Singularity containers, you will need to prefix those environment variables with "SINGULARITYENV_" if you want to use them as overrides. For example, VirSieveIVar allows for a custom reference BED file by setting the PRIMERBED environment variable. To use "mycustom.bed" as a custom BED file in HPG, you would:

$ module load virsieve/20210406
$ export SINGULARITYENV_PRIMERBED=/path/to/mycustom.bed
$ VirSieveIVar /blue/groupname/username/virsieve_test/

The expected working directory structure and lists of the available environment variables for each tool can be found on that tool's page:

https://www.github.com/Zymo-Research/VirSieveAlign
https://www.github.com/Zymo-Research/VirSieveGATK
https://www.github.com/Zymo-Research/VirSieveIVar
https://www.github.com/Zymo-Research/VirSieveVEP

Job Script Examples

Expand this section to view script example.

#!/bin/bash
#SBATCH --job-name=virsieve_test
#SBATCH --mail-type=NONE
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=virsieve_test.log

TEST_PWD=/data/apps/tests/virsieve
TEST_SAMPLEDIR=${TEST_PWD}/sample_data
TEST_WORKDIR=${TEST_PWD}/virsieve_test

cd ${TEST_PWD}
module load virsieve

# Remove any previous test results, create a working directory, and copy
# initial test reads into the expected position in working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
mkdir -p ${TEST_WORKDIR}/rawFASTQ
cp ${TEST_SAMPLEDIR}/*.gz ${TEST_WORKDIR}/rawFASTQ

echo "Starting test run at $(date) on $(hostname)..."

# Run VirSieveAlign (rawFASTQ/ --> processedFASTQ/, mergedBAM/, rawBAM/)
VirSieveAlign ${TEST_WORKDIR}

# Run VirSieveIVar (mergedBAM/ --> primerTrimBAM/)
# export SINGULARITYENV_PRIMERBED=/path/to/mycustom.bed
VirSieveIVar ${TEST_WORKDIR}

# Run VirSieveGATK (primerTrimBAM/ --> filteredVCF/, alignmentArtifactFilteredVCF/, etc.)
VirSieveGATK ${TEST_WORKDIR}

# Run VirSieveVEP (filteredVCF/, alignmentArtifactFilteredVCF/ --> vepOutputs/, results/)
VirSieveVEP ${TEST_WORKDIR}

# There should be some files in the results/ directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR}/results/ -type f ! -empty -ls

echo "Test complete at $(date)."