Latest revision as of 18:52, 10 April 2024

Both the command line BLAST and the Galaxy Framework at UFRC use the same BLAST databases. We retain two releases of the BLASTDB (blast databases) at a time. The current BLASTDB version is made available to the ncbi_blast tools via the BLASTDB environment variable. Currently provided databases are listed below. If you need a custom database or an out-of-cycle NCBI database update to be added and would like to avoid using up your personal filespace quota please file a Support Request Ticket or contact the UFRC Biological Computing Support. The BLAST databases are updated every three months. However, to ensure reproducibility of the BLAST results within the time frame of an average bioinformatics project two old database releases are kept and can be accessed by setting the "$BLASTDB" variable in the job script or by selecting the appropriate database in the BLAST interface in the Galaxy.

NCBI BLASTDB releases

All databases are full mirrors of NCBI data

202404 - default
202304
202112

BLASTDB location

All databases are located in sub-directories of /data/reference/blast. The default database directory is a symlink to the latest release, but the older releases are still available. Its location is set automatically by the ncbi_blast module via the "$BLASTDB" variable. To override the default database location to perhaps use the older release run 'export BLASTDB=/some/path' either on the command line or in the job script as appropriate. E.g. 'export BLASTDB=/data/reference/blast/201506'. Afterwards, continue calling databases by name without the full path. For example, 'blastx -db nr ...'.

Provided BLASTDB databases

Default BLASTDB

Expand this section to view list of NCBI Databases.

16S_ribosomal_RNA
18S_fungal_sequences
28S_fungal_sequences
Betacoronavirus
ITS_RefSeq_Fungi
ITS_eukaryote_sequences
LSU_eukaryote_rRNA
LSU_prokaryote_rRNA
SSU_eukaryote_rRNA
cdd_delta
env_nr
env_nt
human_genome
landmark
mito
mouse_genome
nr
nt
pataa
patnt
pdbaa
pdbnt
ref_euk_rep_genomes
ref_prok_rep_genomes
ref_viroids_rep_genomes
ref_viruses_rep_genomes
refseq_protein
refseq_rna
refseq_select_prot
refseq_select_rna
swissprot
taxdb
tsa_nr
tsa_nt

Custom

Expand this section to view list.

Alligator.miss.v0.2 - Alligator mississippiensis v. 0.2 build
Aliivibrio Fischeri (ASM1180v1)
Arabidopsis (TIAR10)
a_baumannii-AB0057 - A. baumannii str. AB0057
aplCal3 - A. californica 3.0 WGS assembly, 4331 contigs
Camelus Dromedarius - JDVD01000001.1
chlaCavGPIC - Chlamydia psittaci (GPIC)
chlaPneumAR39 - Chlamydia Pneumoniae
chlaTracA - Chlamydia trachomatis serovar A
chlaTracD - Chlamydia trachomatis serovar D
chlaTracL2 - Chlamydia trachomatis serovar L2
chlaTracMurNigg - Chlamydia muridarum
Danio Rerio (zv9)
Danio Rerio (GRCZ10)
DROME_prot - Deep Metazoan Project protein database
Eucalyptus Grandis - Eucalyptus Grandis v2.0
GRCh38.p11 - Human Genome assembly GRCh38.p11
Klebsiella Pneumoniae (HS11286)
Klebsiella Pneumoniae (MGH78578)
Klebsiella Pneumoniae (CAV1596)
lsu108 - LSURef - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
lsu111 - LSURef - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
LSUParc_115 - [1] - is the comprehensive 23S/28S database with all aligned, quality checked rRNA sequences longer than 300 bases.
LSURef_115 - [2] - is the LSU reference database containing only high quality, aligned 23S/28S ribosomal RNA sequences with a minimum length of 1900 bases. A fully classified guide tree is included for fast navigation.
md5nr - A comprehensive non-redundant protein database
m_tuberculosis-CDC1551 - M. tuberculosis str CDC1551
Mycobacterium tuberculosis W-148
Oryx Leucoryx - oryxL1s1
PhumU1_USDA_sc - Pediculus humanus USDA suupercontigs
P Tremula x Alba 717-1B4 v1.1
p_schaeffi_v0_1_bboyd - Bret Boyd's build of the Pediculus Schaeffi genome
rfam_10_1 - release 10.1 of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
rfam_11 - release 11 (August 2012, 2208 families) of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
rn5 - Ratticus norvegicus rn5 release
Salmonella enterica subsp. enterica serovar Javiana str. CFSAN001992
s_enterica-P125109 - S. enterica str P125109
Silba_123_SSURef - Silba 123 SSURef
Sorbi1.21 - S. bicolor ver21 - Sorghum bicolor 1 build number 21.
ssu108nr - SSURef NR - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
ssu111nr - SSURef NR - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
SSUParc_115 - [3] - is the SILVA ribosomal RNA database that contains all aligned sequences with an alignment identity value equal and above 50, an alignment quality value equal and above 40 as well as an basepair score or sequence quality equal and above 30.
SSURef_NR99_115 - [4] - the recommended reference SILVA ribosomal RNA database. It is based on the Ref 115 dataset with a 99% criterion applied to remove redundant sequences using the UCLUST tool. Sequences from cultivated species have been preserved independent from prior filtering. The final dataset contains 479,726 sequences and can be used as a representative dataset for phylogenetic analysis and classification.
vibrChol1 - Vibrio cholerae O1 biovar eltor str. N16961
vibrChol_O395_1 - Vibrio cholerae O395
vibrVuln_CMCP6_1 - Vibrio vulnificus CMCP6

@@ Line 1: / Line 1: @@
-[[Category:Biology]][[Category:Bioinformatics]]
+[[Category:Biology]][[Category:Data]]
-Both the [[Blast|command line BLAST]] and the [[Galaxy|Galaxy Framework]] at UF HPC use the same BLAST databases. We retain two releases of the BLASTDB (blast databases) at a time. The current BLASTDB version is made available to the ncbi_blast tools via the <code>BLASTDB</code> environment variable. Currently provided databases are listed below. If you need a custom database or an out-of-cycle NCBI database update to be added and would like to avoid using up your personal filespace quota please file a [http://support.hpc.ufl.edu Support Request Ticket] or contact the [mailto:bio@hpc.ufl.edu UF HPC Biological Computing Support]. To ensure reproducibility of the analytical results within the time frame of an average bioinformatics project the BLAST databases are updated twice a year around '''May 1st (Release #1)''' and '''November 1st (Release #2)'''.
+Both the [[Blast|command line BLAST]] and the [[Galaxy|Galaxy Framework]] at UFRC use the same BLAST databases. We retain two releases of the BLASTDB (blast databases) at a time. The current BLASTDB version is made available to the ncbi_blast tools via the <code>BLASTDB</code> environment variable. Currently provided databases are listed below. If you need a custom database or an out-of-cycle NCBI database update to be added and would like to avoid using up your personal filespace quota please file a [http://support.rc.ufl.edu Support Request Ticket] or contact the [mailto:bio@rc.ufl.edu UFRC Biological Computing Support]. The BLAST databases are updated every three months. However, to ensure reproducibility of the BLAST results within the time frame of an average bioinformatics project two old database releases are kept and can be accessed by setting the "$BLASTDB" variable in the job script or by selecting the appropriate database in the BLAST interface in the Galaxy.
-==BLASTDB releases==
+=NCBI BLASTDB releases=
-* Default - 2012-12 (Full mirror of NCBI Blast Databases).
+All databases are full mirrors of NCBI data
-* Also available
+* 202404 - '''default'''
-**2012-08 (Full mirror of NCBI Blast Databases).
+* 202304
-**2012-05 (Full mirror of NCBI Blast Databases).
+* 202112
-==BLASTDB location==
+=BLASTDB location=
-All databases are located in sub-directories of <code>/bio/reference/blast</code>. The default database is a <code>/bio/reference/blastd/db</code> symlink to the latest release directory. Its location is set automatically by the ncbi_blast module via the "$BLASTDB" variable.
+All databases are located in sub-directories of <code>/data/reference/blast</code>. The default database directory is a symlink to the latest release, but the older releases are still available. Its location is set automatically by the ncbi_blast module via the "$BLASTDB" variable. To override the default database location to perhaps use the older release run <code>'export BLASTDB=/some/path'</code> either on the command line or in the job script as appropriate. E.g. <code>'export BLASTDB=/data/reference/blast/201506'</code>. Afterwards, continue calling databases by name without the full path. For example, 'blastx -db nr ...'.
-==Provided BLASTDB databases==
+=Provided BLASTDB databases=
+===Default BLASTDB===
-===Custom===
+<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
+''Expand this section to view list of NCBI Databases.''
+<div class="mw-collapsible-content" style="padding: 5px;">
+<pre>
+S_ribosomal_RNA
+S_fungal_sequences
+S_fungal_sequences
+Betacoronavirus
+ITS_RefSeq_Fungi
+ITS_eukaryote_sequences
+LSU_eukaryote_rRNA
+LSU_prokaryote_rRNA
+SSU_eukaryote_rRNA
+cdd_delta
+env_nr
+env_nt
+human_genome
+landmark
+mito
+mouse_genome
+nr
+nt
+pataa
+patnt
+pdbaa
+pdbnt
+ref_euk_rep_genomes
+ref_prok_rep_genomes
+ref_viroids_rep_genomes
+ref_viruses_rep_genomes
+refseq_protein
+refseq_rna
+refseq_select_prot
+refseq_select_rna
+swissprot
+taxdb
+tsa_nr
+tsa_nt
+</pre></div></div>
+==Custom==
+<div class="mw-collapsible mw-collapsed" style="width:70%; padding: 5px; border: 1px solid gray;">
+''Expand this section to view list.''
+<div class="mw-collapsible-content" style="padding: 5px;">
+* Alligator.miss.v0.2 - Alligator mississippiensis v. 0.2 build
+* Aliivibrio Fischeri (ASM1180v1)
+* Arabidopsis (TIAR10)
+* a_baumannii-AB0057 - ''A. baumannii str. AB0057''
+* aplCal3 - ''A. californica'' 3.0 WGS assembly, 4331 contigs
+* Camelus Dromedarius - JDVD01000001.1
 * chlaCavGPIC - Chlamydia psittaci (GPIC)
 * chlaPneumAR39 - Chlamydia Pneumoniae
@@ Line 20: / Line 68: @@
 * chlaTracL2 - Chlamydia trachomatis serovar L2
 * chlaTracMurNigg - Chlamydia muridarum
+* Danio Rerio (zv9)
+* Danio Rerio (GRCZ10)
 * DROME_prot - Deep Metazoan Project protein database
+* Eucalyptus Grandis - Eucalyptus Grandis v2.0
+* GRCh38.p11 - Human Genome assembly GRCh38.p11
+* Klebsiella Pneumoniae (HS11286)
+* Klebsiella Pneumoniae (MGH78578)
+* Klebsiella Pneumoniae (CAV1596)
+* lsu108 - [http://www.arb-silva.de/ LSURef] - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
+* lsu111 - [http://www.arb-silva.de/ LSURef] - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
+* LSUParc_115 - [http://www.arb-silva.de/] - is the comprehensive 23S/28S database with all aligned, quality checked rRNA sequences longer than 300 bases.
+* LSURef_115 - [http://www.arb-silva.de/] - is the LSU reference database containing only high quality, aligned 23S/28S ribosomal RNA sequences with a minimum length of 1900 bases. A fully classified guide tree is included for fast navigation.
 * md5nr - A comprehensive non-redundant protein database
-* Alligator.miss.v0.2 - Alligator mississippiensis v. 0.2 build
+* m_tuberculosis-CDC1551 - ''M. tuberculosis str CDC1551''
+* Mycobacterium tuberculosis W-148
+* Oryx Leucoryx - oryxL1s1
 * PhumU1_USDA_sc - Pediculus humanus USDA suupercontigs
+* P Tremula x Alba 717-1B4 v1.1
 * p_schaeffi_v0_1_bboyd - Bret Boyd's build of the Pediculus Schaeffi genome
+* rfam_10_1 - release 10.1 of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
+* rfam_11 - release 11 (August 2012, 2208 families) of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
+* rn5 - ''Ratticus norvegicus'' rn5 release
+* Salmonella enterica subsp. enterica serovar Javiana str. CFSAN001992
+* s_enterica-P125109 - ''S. enterica str P125109''
+* Silba_123_SSURef - Silba 123 SSURef
+* Sorbi1.21 - ''S. bicolor ver21'' - [http://plants.ensembl.org/Sorghum_bicolor/Info/Index Sorghum bicolor 1 build number 21].
+* ssu108nr - [http://www.arb-silva.de/ SSURef NR] - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
+* ssu111nr - [http://www.arb-silva.de/ SSURef NR] - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
+* SSUParc_115 - [http://www.arb-silva.de/] - is the SILVA ribosomal RNA database that contains all aligned sequences with an alignment identity value equal and above 50, an alignment quality value equal and above 40 as well as an basepair score or sequence quality equal and above 30.
+* SSURef_NR99_115 - [http://www.arb-silva.de/] - the recommended reference SILVA ribosomal RNA database. It is based on the Ref 115 dataset with a 99% criterion applied to remove redundant sequences using the  UCLUST tool. Sequences from cultivated species have been preserved independent from prior filtering. The final dataset contains 479,726 sequences and can be used as a representative dataset for phylogenetic analysis and classification.
 * vibrChol1 - Vibrio cholerae O1 biovar eltor str. N16961
 * vibrChol_O395_1 - Vibrio cholerae O395
 * vibrVuln_CMCP6_1 - Vibrio vulnificus CMCP6
+</div></div>
-===NCBI===
+__NOTOC__
-'''Protein:'''
-*        env_nr
-*        nr
-*        refseq_protein
-*        swissprot
-*        pataa
-*        pdbaa
-'''Nucleotide:'''
-*        16SMicrobial
-*        env_nt
-*        est
-*        est_human
-*        est_mouse
-*        est_others
-*        gss
-*        htgs
-*        human_genomic
-*        human_genomic_transcript
-*        mouse_genomic_transcript
-*        nt
-*        other_genomic
-*        patnt
-*        pdbnt
-*        refseq_genomic
-*        refseq_rna
-*        refseqgene
-*        sts
-*        tsa_nt
-*        vector
-*        wgs

BLASTDB: Difference between revisions

Latest revision as of 18:52, 10 April 2024

NCBI BLASTDB releases

BLASTDB location

Provided BLASTDB databases

Default BLASTDB

Custom

Navigation menu

BLASTDB: Difference between revisions

Latest revision as of 18:52, 10 April 2024

NCBI BLASTDB releases

BLASTDB location

Provided BLASTDB databases

Default BLASTDB

Custom

Navigation menu

Search