Blast Job Scripts

From UFRC
Revision as of 20:27, 3 June 2022 by Israel.herrera (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Back to the BLAST page

See the Annotated SLURM Job Script page for explanation of the #SLURM directives.

Note
Replace all <VARIABLE> sections with your information.

Simple BLASTP Job

Run a blastp job with 4 threads against the nr database.

Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file. {{#fileAnchor: blastp.sh}}

#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
date;hostname;pwd

module load ncbi_blast

blastp -query query.fa -db nr -out output.txt -outfmt 6 -evalue 0.001

date

BLASTN Job Array

Generate input query files from a single fasta file
  • Create and change into input directory
mkdir input
cd input
  • Split the query file
faSplit sequence ../large.fasta 120 blast_query_

Note that the number is larger than the 100 item array listed below. That's because faSplit from the UCSC Genome Browser utilities will not split the input query file into exactly the number of chunks that were specified. Some experimentation may be required to arrive at a reasonable number of small query files to provide the highest throughput of the BLAST alignment project depending on the number and size of entries in the original fasta query file and the SLURM allocation of the account used.

Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file. {{#fileAnchor: blastp_array.sh}}

#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
#SBATCH --array=1-100
date;hostname;pwd

module load ncbi_blast
 
export INPUT_DIR="input"
export OUTPUT_DIR="output"
export LOG_DIR="logs"
mkdir -p ${OUTPUT_DIR} ${LOG_DIR}
 
RUN_ID=$(( $SLURM_ARRAY_TASK_ID + 1 ))
 
QUERY_FILE=$( ls ${INPUT_DIR} | sed -n ${RUN_ID}p )
QUERY_NAME="${QUERY_FILE%.*}"
 
QUERY="${INPUT_DIR}/${QUERY_FILE}"
OUTPUT="${OUTPUT_DIR}/${QUERY_NAME}.out"
 
echo -e "Command:\nblastnquery ${QUERY} –db ntout ${OUTPUT} –evalue 0.001 –outfmt 6 –num_threads 8"
 
blastn -query ${QUERY} -db nt -out ${OUTPUT} -evalue 0.001 -outfmt 6 -num_threads 8
 
date