![]()
![]()
![]()
![]()
Databases
The databases available for BLAST searching (at NCBI) are:
Peptide (protein) Sequence Databases
- nr = All non-redundant GenBank CDS translations+PDB+SwissProt+PIR
- month = All new or revised GenBank CDS translation+PDB+SwissProt+PIR released in the last 30 days.
- swissprot = The SWISS-PROT protein sequence database
- yeast = Yeast (Saccharomyces cerevisiae) protein sequences.
- pdb = Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank
- kabat = Kabat's database of sequences of immunological interest
- alu = Translations of select Alu repeats
Nucleotide Sequence Databases
- nr = All Non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST's or STS's)
- month = All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
- dbest = Expressed Sequence Tags dbsts = Sequence Tagged Sites
- yeast = Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences
- pdb = Nucleotide sequences derived from 3-dimensional protein structures in the Brookhaven Protein Data Bank
- kabat = Kabat's database of sequences of immunological interest
- vector = Vector subset of GenBank
- mito = Database of mitochondrial sequences, Rel. 1.0, July 1995"
- alu = Select Alu repeats
- epd = Eukaryotic Promoter Database
- gss =Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
The databases available for FastA searching (at the RCR) are:
Protein
- sp:* = SwissProt - Amos Bairoch's protein sequence database (extremely well organized and annotated)
- gp:* = GenPept - Translations of all GenBank DNA seqs (according to exons in features tables)
- pir:* - Protein Information Resource
- pir1:* - Annotated PIR entries
- pir2:* - New PIR entries
- pir3:* - Unverified PIR entries (your guess is as good as mine what they mean by "unverified")
- pir4:* - Unencoded or untranslated
- nrl_3d:* - sequences from 3-dimensional structure Brookhaven Protein Data Bank
- Prosite - consensus seqs of conserved protein domains
- TFD - Transcription Factor database
DNA
- VECTOR - vector sequences
- MALARIA - malaria genomic sequences
- gb:* - all of GenBank (includes EMBL, DDBJ, PDB) updated daily
GenBank Subdivisions
- gb_ba:* Bacterial
- gb_in:* - Invertebrate
- gb_om:* - Other Mammalian (non-rodent, non-primate)
- gb_ov:* - Other Vertebrate (non-mammalian vertebrates)
- gb_or:* - Organelle
- gb_pat:* - Patents
- gb_ph:* - Phage
- gb_pl:* - Plant
- gb_pr:* - Primate
- gb_ro:* - Rodent
- gb_st:* - Structural RNA
- gb_sy:* - Synthetic sequences (recombinant constructs, etc.)
- gb_un:* - Unannotated
- gb_vi:* - Viral
- gb_est*:* - Expressed Sequence Tags (short cDNAs) - now has sections est1 to est 9 with more added each quarter.
- gb_sts:* - Sequence Tagged Sites
- gb_gss:* - Genomic Survey Sequences (large genomic contigs)
- gb_htg:* - High Throughput Genomic sequences (single pass sequences churned out by the genome projects, unannotated and filled with errors)
- gb_tag:* - ESTs + STS + GSS + HTG
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D, RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu