Difference between revisions of "Parabricks"

From UFRC
Jump to navigation Jump to search
Line 50: Line 50:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
#SBATCH --partition hpg-ai              # partition
+
#SBATCH --partition=hpg-ai              # partition
#SBATCH --time 4:00:00              # wall time
+
#SBATCH --time=4:00:00              # wall time
 
#SBATCH --mem=64gb              # all mem avail
 
#SBATCH --mem=64gb              # all mem avail
 
#SBATCH --mail-type=FAIL        # only send email on failure
 
#SBATCH --mail-type=FAIL        # only send email on failure
Line 57: Line 57:
 
#SBATCH --ntasks=1
 
#SBATCH --ntasks=1
 
#SBATCH --cpus-per-task=8
 
#SBATCH --cpus-per-task=8
#SBATCH --gpus a100:2  # Number A100 GPUs, 2-8
+
#SBATCH --gpus=a100:2  # Number A100 GPUs, 2-8
 +
#SBATCH --output=pb_%j.log
 +
date;hostname;pwd
  
# Load the Parabricks module
+
# Load the Parabricks environment module
 
module load parabricks
 
module load parabricks
  
Line 70: Line 72:
  
 
# Make the output directory
 
# Make the output directory
mkdir ${OUTPUT_DIR}
+
mkdir -p ${OUTPUT_DIR}
  
 
# Run germline
 
# Run germline

Revision as of 18:03, 27 January 2022

Description

parabricks website  

Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline which takes raw data and transforms it according to the user's requirements.

Parabricks makes both GPU-accelerated pipelines and some standalone tools available.

Environment Modules

Run module spider parabricks to find out what environment modules are available for this application.

System Variables

  • HPC_PARABRICKS_DIR - installation directory
  • HPC_PARABRICKS_BIN - executable directory


Additional Information

An example job resource request based on the Nvidia recommendation:

srun -p hpg-ai -N 1 --cpus-per-task=16 --gpus=a100:2 --mem=32gb --time=200:00 --pty bash -i

Prabricks requires 2 to 8 A100 GPUs to run. The '--num-gpus X' pbrun argument must match the number 'X' of requested GPUs. If not specified parabricks will try to run on all gpus on the compute node and exit with an error.

Note that a parabricks run may produce an error if the paths used as arguments for the run resolve to symlinks. Containerized tools need real paths i.e. use the /blue/mygroup/myuser/project/inputdir path instead of a shorter ~/blue/project/inputdir path that's using a symlink even if the ~/blue symlink is pointing to the /blue/mygroup/myuser/ directory./

Job Script Examples

#!/bin/bash
#SBATCH --partition=hpg-ai               # partition
#SBATCH --time=4:00:00              # wall time
#SBATCH --mem=64gb              # all mem avail
#SBATCH --mail-type=FAIL        # only send email on failure
#SBATCH --mail-user=your@email.com
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gpus=a100:2  # Number A100 GPUs, 2-8
#SBATCH --output=pb_%j.log
date;hostname;pwd

# Load the Parabricks environment module
module load parabricks

# Set data directories
DATA_DIR="/blue/nvidia-parabricks/g.burnett/parabricks_sample"
SAMPLE_1="${DATA_DIR}/Data/sample_1.fq.gz"
SAMPLE_2="${DATA_DIR}/Data/sample_2.fq.gz"
REF="${DATA_DIR}/Ref/Homo_sapiens_assembly38.fasta"
OUTPUT_DIR="${DATA_DIR}"

# Make the output directory
mkdir -p ${OUTPUT_DIR}

# Run germline
pbrun germline \
        --ref ${REF} \
        --in-fq ${SAMPLE_1} ${SAMPLE_2} \
        --out-bam ${OUTPUT_DIR}/germline.bam \
	--out-variants ${OUTPUT_DIR}/germline.vcf |& tee ${OUTPUT_DIR}/germline.log