Difference between revisions of "Blast Job Scripts"

From UFRC
Jump to navigation Jump to search
(Created page with "Category:Job Scripts Back to the BLAST page See the Annotated SLURM Job Script page for explanation of the <code>#SLURM</code> direct...")
 
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[Category:Job Scripts]]
 
 
[[Blast|Back to the BLAST page]]
 
[[Blast|Back to the BLAST page]]
  
Line 10: Line 9:
  
 
Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file.
 
Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file.
{{#fileAnchor: run.sh}}
+
{{#fileAnchor: blastp.sh}}
 
<source lang=make>
 
<source lang=make>
 
#!/bin/bash
 
#!/bin/bash
Line 39: Line 38:
 
Note that the number is larger than the 100 item array listed below. That's because faSplit from the UCSC Genome Browser utilities will not split the input query file into exactly the number of chunks that were specified. Some experimentation may be required to arrive at a reasonable number of small query files to provide the highest throughput of the BLAST alignment project depending on the number and size of entries in the original fasta query file and the SLURM allocation of the account used.
 
Note that the number is larger than the 100 item array listed below. That's because faSplit from the UCSC Genome Browser utilities will not split the input query file into exactly the number of chunks that were specified. Some experimentation may be required to arrive at a reasonable number of small query files to provide the highest throughput of the BLAST alignment project depending on the number and size of entries in the original fasta query file and the SLURM allocation of the account used.
  
Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file.
+
Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file.
{{#fileAnchor: run.sh}}
+
{{#fileAnchor: blastp_array.sh}}
 
<source lang=make>
 
<source lang=make>
 
#!/bin/bash
 
#!/bin/bash
Line 51: Line 50:
 
#SBATCH --mem=8gb
 
#SBATCH --mem=8gb
 
#SBATCH --time=4:00:00
 
#SBATCH --time=4:00:00
 +
#SBATCH --array=1-100
 
date;hostname;pwd
 
date;hostname;pwd
  

Latest revision as of 20:27, 3 June 2022

Back to the BLAST page

See the Annotated SLURM Job Script page for explanation of the #SLURM directives.

Note
Replace all <VARIABLE> sections with your information.

Simple BLASTP Job

Run a blastp job with 4 threads against the nr database.

Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file. {{#fileAnchor: blastp.sh}}

#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
date;hostname;pwd

module load ncbi_blast

blastp -query query.fa -db nr -out output.txt -outfmt 6 -evalue 0.001

date

BLASTN Job Array

Generate input query files from a single fasta file
  • Create and change into input directory
mkdir input
cd input
  • Split the query file
faSplit sequence ../large.fasta 120 blast_query_

Note that the number is larger than the 100 item array listed below. That's because faSplit from the UCSC Genome Browser utilities will not split the input query file into exactly the number of chunks that were specified. Some experimentation may be required to arrive at a reasonable number of small query files to provide the highest throughput of the BLAST alignment project depending on the number and size of entries in the original fasta query file and the SLURM allocation of the account used.

Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file. {{#fileAnchor: blastp_array.sh}}

#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
#SBATCH --array=1-100
date;hostname;pwd

module load ncbi_blast
 
export INPUT_DIR="input"
export OUTPUT_DIR="output"
export LOG_DIR="logs"
mkdir -p ${OUTPUT_DIR} ${LOG_DIR}
 
RUN_ID=$(( $SLURM_ARRAY_TASK_ID + 1 ))
 
QUERY_FILE=$( ls ${INPUT_DIR} | sed -n ${RUN_ID}p )
QUERY_NAME="${QUERY_FILE%.*}"
 
QUERY="${INPUT_DIR}/${QUERY_FILE}"
OUTPUT="${OUTPUT_DIR}/${QUERY_NAME}.out"
 
echo -e "Command:\nblastnquery ${QUERY} –db ntout ${OUTPUT} –evalue 0.001 –outfmt 6 –num_threads 8"
 
blastn -query ${QUERY} -db nt -out ${OUTPUT} -evalue 0.001 -outfmt 6 -num_threads 8
 
date