BLASTDB

From UFRC
Revision as of 23:28, 4 December 2014 by Moskalenko (talk | contribs)
Jump to navigation Jump to search

Both the command line BLAST and the Galaxy Framework at UFRC use the same BLAST databases. We retain two releases of the BLASTDB (blast databases) at a time. The current BLASTDB version is made available to the ncbi_blast tools via the BLASTDB environment variable. Currently provided databases are listed below. If you need a custom database or an out-of-cycle NCBI database update to be added and would like to avoid using up your personal filespace quota please file a Support Request Ticket or contact the UFRC Biological Computing Support. The BLAST databases are updated every three months. However, to ensure reproducibility of the BLAST results within the time frame of an average bioinformatics project two old database releases are kept and can be accessed by setting the "$BLASTDB" variable in the job script or by selecting the appropriate database in the BLAST interface in the Galaxy.

NCBI BLASTDB releases

  • 2014-11 (Full mirror) - default database.
  • 2014-04 (Full mirror)
  • 2013-08 (Full mirror)
  • 2013-05 (nr).
  • 2013-04 (Full mirror).
  • 2012-08 (Full mirror).

BLASTDB location

All databases are located in sub-directories of /scratch/lfs/bio/reference/blast. The default database is a /scratch/lfs/bio/reference/blast/db symlink to the latest release directory. Its location is set automatically by the ncbi_blast module via the "$BLASTDB" variable.

Provided BLASTDB databases

Custom

  • Alligator.miss.v0.2 - Alligator mississippiensis v. 0.2 build
  • a_baumannii-AB0057 - A. baumannii str. AB0057
  • aplCal3 - A. californica 3.0 WGS assembly, 4331 contigs
  • chlaCavGPIC - Chlamydia psittaci (GPIC)
  • chlaPneumAR39 - Chlamydia Pneumoniae
  • chlaTracA - Chlamydia trachomatis serovar A
  • chlaTracD - Chlamydia trachomatis serovar D
  • chlaTracL2 - Chlamydia trachomatis serovar L2
  • chlaTracMurNigg - Chlamydia muridarum
  • DROME_prot - Deep Metazoan Project protein database
  • lsu108 - LSURef - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
  • lsu111 - LSURef - large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
  • LSUParc_115 - [1] - is the comprehensive 23S/28S database with all aligned, quality checked rRNA sequences longer than 300 bases.
  • LSURef_115 - [2] - is the LSU reference database containing only high quality, aligned 23S/28S ribosomal RNA sequences with a minimum length of 1900 bases. A fully classified guide tree is included for fast navigation.
  • md5nr - A comprehensive non-redundant protein database
  • m_tuberculosis-CDC1551 - M. tuberculosis str CDC1551
  • PhumU1_USDA_sc - Pediculus humanus USDA suupercontigs
  • p_schaeffi_v0_1_bboyd - Bret Boyd's build of the Pediculus Schaeffi genome
  • rfam_10_1 - release 10.1 of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
  • rfam_11 - release 11 (August 2012, 2208 families) of the Rfam collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
  • rn5 - Ratticus norvegicus rn5 release
  • s_enterica-P125109 - S. enterica str P125109
  • Sorbi1.21 - S. bicolor ver21 - Sorghum bicolor 1 build number 21.
  • ssu108nr - SSURef NR - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 108.
  • ssu111nr - SSURef NR - small (16S/18S, SSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya), release 111 (July 2012).
  • SSUParc_115 - [3] - is the SILVA ribosomal RNA database that contains all aligned sequences with an alignment identity value equal and above 50, an alignment quality value equal and above 40 as well as an basepair score or sequence quality equal and above 30.
  • SSURef_NR99_115 - [4] - the recommended reference SILVA ribosomal RNA database. It is based on the Ref 115 dataset with a 99% criterion applied to remove redundant sequences using the UCLUST tool. Sequences from cultivated species have been preserved independent from prior filtering. The final dataset contains 479,726 sequences and can be used as a representative dataset for phylogenetic analysis and classification.
  • vibrChol1 - Vibrio cholerae O1 biovar eltor str. N16961
  • vibrChol_O395_1 - Vibrio cholerae O395
  • vibrVuln_CMCP6_1 - Vibrio vulnificus CMCP6

NCBI

Protein:

  • env_nr
  • nr
  • refseq_protein
  • swissprot
  • pataa
  • pdbaa

Nucleotide:

  • 16SMicrobial
  • env_nt
  • est
  • est_human
  • est_mouse
  • est_others
  • gss
  • htgs
  • human_genomic
  • human_genomic_transcript
  • mouse_genomic_transcript
  • nt
  • other_genomic
  • patnt
  • pdbnt
  • refseq_genomic
  • refseq_rna
  • refseqgene
  • sts
  • tsa_nt
  • vector
  • wgs