Difference between revisions of "Blast Job Scripts"
Moskalenko (talk | contribs) |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
− | |||
[[Blast|Back to the BLAST page]] | [[Blast|Back to the BLAST page]] | ||
Line 10: | Line 9: | ||
Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file. | Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file. | ||
− | {{#fileAnchor: | + | {{#fileAnchor: blastp.sh}} |
<source lang=make> | <source lang=make> | ||
#!/bin/bash | #!/bin/bash | ||
Line 40: | Line 39: | ||
Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file. | Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file. | ||
− | {{#fileAnchor: | + | {{#fileAnchor: blastp_array.sh}} |
<source lang=make> | <source lang=make> | ||
#!/bin/bash | #!/bin/bash |
Latest revision as of 20:27, 3 June 2022
See the Annotated SLURM Job Script page for explanation of the #SLURM
directives.
- Note
- Replace all
<VARIABLE>
sections with your information.
Simple BLASTP Job
Run a blastp job with 4 threads against the nr database.
Download raw source of the [{{#fileLink: blastp.sh}} blastp.sh] file. {{#fileAnchor: blastp.sh}}
#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
date;hostname;pwd
module load ncbi_blast
blastp -query query.fa -db nr -out output.txt -outfmt 6 -evalue 0.001
date
BLASTN Job Array
- Generate input query files from a single fasta file
- Create and change into input directory
mkdir input cd input
- Split the query file
faSplit sequence ../large.fasta 120 blast_query_
Note that the number is larger than the 100 item array listed below. That's because faSplit from the UCSC Genome Browser utilities will not split the input query file into exactly the number of chunks that were specified. Some experimentation may be required to arrive at a reasonable number of small query files to provide the highest throughput of the BLAST alignment project depending on the number and size of entries in the original fasta query file and the SLURM allocation of the account used.
Download raw source of the [{{#fileLink: blastp_array.sh}} blastp_array.sh] file. {{#fileAnchor: blastp_array.sh}}
#!/bin/bash
#SBATCH --job-name=<JOBNAME>
#SBATCH --mail-user=<EMAIL>
#SBATCH --mail-type=FAIL,END
#SBATCH --output <blastp_%j.log>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8gb
#SBATCH --time=4:00:00
#SBATCH --array=1-100
date;hostname;pwd
module load ncbi_blast
export INPUT_DIR="input"
export OUTPUT_DIR="output"
export LOG_DIR="logs"
mkdir -p ${OUTPUT_DIR} ${LOG_DIR}
RUN_ID=$(( $SLURM_ARRAY_TASK_ID + 1 ))
QUERY_FILE=$( ls ${INPUT_DIR} | sed -n ${RUN_ID}p )
QUERY_NAME="${QUERY_FILE%.*}"
QUERY="${INPUT_DIR}/${QUERY_FILE}"
OUTPUT="${OUTPUT_DIR}/${QUERY_NAME}.out"
echo -e "Command:\nblastn –query ${QUERY} –db nt –out ${OUTPUT} –evalue 0.001 –outfmt 6 –num_threads 8"
blastn -query ${QUERY} -db nt -out ${OUTPUT} -evalue 0.001 -outfmt 6 -num_threads 8
date