Difference between revisions of "Guppy"

From UFRC
Jump to navigation Jump to search
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Software]][[Category:Biology]]
+
[[Category:Software]][[Category:Phylogenetics]]
 
{|<!--CONFIGURATION: REQUIRED-->
 
{|<!--CONFIGURATION: REQUIRED-->
 
|{{#vardefine:app|guppy}}
 
|{{#vardefine:app|guppy}}
Line 33: Line 33:
 
<!--Run-->
 
<!--Run-->
 
{{#if: {{#var: exe}}|==Additional Information==
 
{{#if: {{#var: exe}}|==Additional Information==
'''Warning:''' Guppy can be ''orders of magnitude'' faster when using GPUs for basecalling compared to pure CPU runs. See [[GPU Access]] for more details on how to request GPUs on HiPerGator. Here's a sample script you might want to use as a starting point:
+
'''Warning:''' Guppy can be ''orders of magnitude'' faster when using GPUs for basecalling compared to pure CPU runs. See [[GPU Access]] for more details on how to request GPUs on HiPerGator. Here's a sample Guppy GPU script you might want to use as a starting point:
  
 +
===GPU Job Example===
 +
<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
 +
''Expand to view GPU script sample.''
 +
<div class="mw-collapsible-content" style="padding: 5px;">
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
Line 45: Line 49:
 
#SBATCH --mail-user=MYEMAIL@ufl.edu
 
#SBATCH --mail-user=MYEMAIL@ufl.edu
 
#SBATCH --partition=gpu
 
#SBATCH --partition=gpu
#SBATCH --gpus=1
+
#SBATCH --gpus=geforce:1
 
date;hostname;pwd
 
date;hostname;pwd
  
Line 60: Line 64:
 
date
 
date
 
</pre>
 
</pre>
 +
</div>
 +
</div>
 +
Use jobnvtop  and jobhtop tools from the [[UFRC environment module]] for a real-time look at GPU and CPU processes on the job node.
 +
 +
===CPU Job Example===
 +
If you don't have access to GPUs then set
 +
#SBATCH --ntasks=1
 +
#SBATCH --cpus-per-task=X
 +
 +
in the resources request section and use the corresponding guppy_basecaller arguments to set the number of basecallers and threads to use CPUs:
 +
 +
--cpu_threads_per_caller X --num_callers 1
 +
 +
where X is the same numbers in both sections. You can parallelize callers if appropriate.
  
 
|}}
 
|}}

Latest revision as of 21:07, 14 December 2022

Description

guppy website  

Guppy is a data processing toolkit that contains the Oxford Nanopore Technologies' basecalling algorithms, and several bioinformatic post-processing features.

Early downstream analysis components such as barcoding/demultiplexing, adapter trimming and alignment are contained within Guppy. Furthermore, Guppy now performs modified basecalling (5mC, 6mA and CpG) from the raw signal data, producing an additional FAST5 file of modified base probabilities.

Environment Modules

Run module spider guppy to find out what environment modules are available for this application.

System Variables

  • HPC_GUPPY_DIR - installation directory

Additional Information

Warning: Guppy can be orders of magnitude faster when using GPUs for basecalling compared to pure CPU runs. See GPU Access for more details on how to request GPUs on HiPerGator. Here's a sample Guppy GPU script you might want to use as a starting point:

GPU Job Example

Expand to view GPU script sample.

#!/bin/bash
#SBATCH --job-name=guppy
#SBATCH --output=guppy_%j.out
#SBATCH --time=12:00:00
#SBATCH --ntasks=1
#SBATCH --mem=10GB
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=MYEMAIL@ufl.edu
#SBATCH --partition=gpu
#SBATCH --gpus=geforce:1
date;hostname;pwd

module purge
module load cuda guppy

guppy_basecaller \
    --recursive \
        --input_path cool_project/minion_input/fast5 \
        --save_path cool_project/minion_output/basecalls \
        --flowcell FLO-MIN107 --kit SQK-LSK109 \
        --device "auto"

date

Use jobnvtop and jobhtop tools from the UFRC environment module for a real-time look at GPU and CPU processes on the job node.

CPU Job Example

If you don't have access to GPUs then set

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=X

in the resources request section and use the corresponding guppy_basecaller arguments to set the number of basecallers and threads to use CPUs:

--cpu_threads_per_caller X --num_callers 1

where X is the same numbers in both sections. You can parallelize callers if appropriate.