Difference between revisions of "Parabricks"
(Created page with "Category:SoftwareCategory:Phylogenetics {|<!--CONFIGURATION: REQUIRED--> |{{#vardefine:app|parabricks}} |{{#vardefine:url|https://www.nvidia.com/en-us/docs/parabricks/...") |
|||
(15 intermediate revisions by 4 users not shown) | |||
Line 6: | Line 6: | ||
|{{#vardefine:conf|}} <!--CONFIGURATION--> | |{{#vardefine:conf|}} <!--CONFIGURATION--> | ||
|{{#vardefine:exe|1}} <!--ADDITIONAL INFO--> | |{{#vardefine:exe|1}} <!--ADDITIONAL INFO--> | ||
− | |{{#vardefine:job|}} <!--JOB SCRIPTS--> | + | |{{#vardefine:job|1}} <!--JOB SCRIPTS--> |
|{{#vardefine:policy|}} <!--POLICY--> | |{{#vardefine:policy|}} <!--POLICY--> | ||
|{{#vardefine:testing|}} <!--PROFILING--> | |{{#vardefine:testing|}} <!--PROFILING--> | ||
Line 18: | Line 18: | ||
{{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | {{App_Description|app={{#var:app}}|url={{#var:url}}|name={{#var:app}}}}|}} | ||
− | Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline which takes raw data and | + | Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline which takes raw data and transforms it according to the user's requirements. |
+ | |||
+ | Parabricks makes both [https://www.nvidia.com/en-us/docs/parabricks/quickstart-guide/software-overview/pipelines-overview/ GPU-accelerated pipelines] and some [https://www.nvidia.com/en-us/docs/parabricks/quickstart-guide/software-overview/standalone-tools-overview/ standalone tools] available. | ||
<!--Modules--> | <!--Modules--> | ||
Line 36: | Line 38: | ||
An example job resource request based on the Nvidia recommendation: | An example job resource request based on the Nvidia recommendation: | ||
− | srun -p | + | srun -p hpg-ai -N 1 --cpus-per-task=16 --gpus=a100:2 --mem=32gb --time=200:00 --pty bash -i |
+ | |||
+ | Prabricks requires 2 to 8 A100 GPUs to run. The '--num-gpus X' pbrun argument must match the number 'X' of requested GPUs. If not specified parabricks will try to run on all gpus on the compute node and exit with an error. | ||
− | + | Note that a parabricks run may produce an error if the paths used as arguments for the run resolve to symlinks. Containerized tools need real paths i.e. use the /blue/mygroup/myuser/project/inputdir path instead of a shorter ~/blue/project/inputdir path that's using a symlink even if the ~/blue symlink is pointing to the /blue/mygroup/myuser/ directory./ | |
|}} | |}} | ||
<!--Job Scripts--> | <!--Job Scripts--> | ||
{{#if: {{#var: job}}|==Job Script Examples== | {{#if: {{#var: job}}|==Job Script Examples== | ||
− | + | <div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;"> | |
+ | ''Expand to view script sample.'' | ||
+ | <div class="mw-collapsible-content" style="padding: 5px;"> | ||
+ | <pre> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --partition=hpg-ai # partition | ||
+ | #SBATCH --time=4:00:00 # wall time | ||
+ | #SBATCH --mem=64gb # all mem avail | ||
+ | #SBATCH --mail-type=FAIL # only send email on failure | ||
+ | #SBATCH --mail-user=your@email.com | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=8 | ||
+ | #SBATCH --gpus=a100:2 # Number A100 GPUs, 2-8 | ||
+ | #SBATCH --output=pb_%j.log | ||
+ | date;hostname;pwd | ||
+ | |||
+ | # Load the Parabricks environment module | ||
+ | module load parabricks | ||
+ | |||
+ | # Set data directories | ||
+ | DATA_DIR="/blue/nvidia-parabricks/g.burnett/parabricks_sample" | ||
+ | SAMPLE_1="${DATA_DIR}/Data/sample_1.fq.gz" | ||
+ | SAMPLE_2="${DATA_DIR}/Data/sample_2.fq.gz" | ||
+ | REF="${DATA_DIR}/Ref/Homo_sapiens_assembly38.fasta" | ||
+ | OUTPUT_DIR="${DATA_DIR}" | ||
+ | |||
+ | # Make the output directory | ||
+ | mkdir -p ${OUTPUT_DIR} | ||
+ | |||
+ | # Run germline | ||
+ | pbrun germline \ | ||
+ | --ref ${REF} \ | ||
+ | --in-fq ${SAMPLE_1} ${SAMPLE_2} \ | ||
+ | --out-bam ${OUTPUT_DIR}/germline.bam \ | ||
+ | --out-variants ${OUTPUT_DIR}/germline.vcf |& tee ${OUTPUT_DIR}/germline.log | ||
+ | </pre> | ||
+ | </div> | ||
+ | </div> | ||
|}} | |}} | ||
<!--Policy--> | <!--Policy--> | ||
Line 72: | Line 113: | ||
<!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | <!--Turn the Table of Contents and Edit paragraph links ON/OFF--> | ||
__NOTOC____NOEDITSECTION__ | __NOTOC____NOEDITSECTION__ | ||
+ | |||
+ | ==More Information== | ||
+ | === Parabricks on HiPerGator Tutorial === | ||
+ | * GitHub repository with code examples: [https://github.com/hw-ju/genomics_uf_tutorials https://github.com/hw-ju/genomics_uf_tutorials] |
Latest revision as of 17:01, 17 September 2024
Description
Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline which takes raw data and transforms it according to the user's requirements.
Parabricks makes both GPU-accelerated pipelines and some standalone tools available.
Environment Modules
Run module spider parabricks
to find out what environment modules are available for this application.
System Variables
- HPC_PARABRICKS_DIR - installation directory
- HPC_PARABRICKS_BIN - executable directory
Additional Information
An example job resource request based on the Nvidia recommendation:
srun -p hpg-ai -N 1 --cpus-per-task=16 --gpus=a100:2 --mem=32gb --time=200:00 --pty bash -i
Prabricks requires 2 to 8 A100 GPUs to run. The '--num-gpus X' pbrun argument must match the number 'X' of requested GPUs. If not specified parabricks will try to run on all gpus on the compute node and exit with an error.
Note that a parabricks run may produce an error if the paths used as arguments for the run resolve to symlinks. Containerized tools need real paths i.e. use the /blue/mygroup/myuser/project/inputdir path instead of a shorter ~/blue/project/inputdir path that's using a symlink even if the ~/blue symlink is pointing to the /blue/mygroup/myuser/ directory./
Job Script Examples
Expand to view script sample.
#!/bin/bash #SBATCH --partition=hpg-ai # partition #SBATCH --time=4:00:00 # wall time #SBATCH --mem=64gb # all mem avail #SBATCH --mail-type=FAIL # only send email on failure #SBATCH --mail-user=your@email.com #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --gpus=a100:2 # Number A100 GPUs, 2-8 #SBATCH --output=pb_%j.log date;hostname;pwd # Load the Parabricks environment module module load parabricks # Set data directories DATA_DIR="/blue/nvidia-parabricks/g.burnett/parabricks_sample" SAMPLE_1="${DATA_DIR}/Data/sample_1.fq.gz" SAMPLE_2="${DATA_DIR}/Data/sample_2.fq.gz" REF="${DATA_DIR}/Ref/Homo_sapiens_assembly38.fasta" OUTPUT_DIR="${DATA_DIR}" # Make the output directory mkdir -p ${OUTPUT_DIR} # Run germline pbrun germline \ --ref ${REF} \ --in-fq ${SAMPLE_1} ${SAMPLE_2} \ --out-bam ${OUTPUT_DIR}/germline.bam \ --out-variants ${OUTPUT_DIR}/germline.vcf |& tee ${OUTPUT_DIR}/germline.log
More Information
Parabricks on HiPerGator Tutorial
- GitHub repository with code examples: https://github.com/hw-ju/genomics_uf_tutorials