ENTREZ
In many situations, the best way to find sequences is to use the Web with a tool called ENTREZ.
ENTREZ is a service of NCBI (National Center for Biotechnology Information) which is a part of the US National Library of Medicine.
The ENTREZ database contains all of the nucleotide and protein sequences in GenBank (updated daily!) along with a sequence-associated subset of MEDLINE. But ENTREZ is much more than
a database, it is a both a powerful search engine and a pre-computed list of
relationships between all data elements.
In practice, this means that you can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term.
Then from any article or sequence, you can move to "related articles" or "related sequences".
Relationships between sequences are computed with BLAST
Relationships between articles are computed with "MESH" terms (shared keywords)
Relationships between DNA and protein sequences rely on accession numbers
Relationships between sequences and MEDLINE articles rely on both shared
keywords and the mention of accession numbers in the articles.
These pre-computed relationships might include genes in the same
multi-gene family, articles written about genes that have the same function,
or other proteins that function in the same biochemical pathway.
This potential for "horizontal movement" through the database makes ENTREZ
really exciting. It allows you to start with only a vague set of keywords
or a sequence identified in the laboratory and rapidly access a set of relevant
literature and a list of related database sequences.
There is also a stand-alone
client application called NENTREZ that can be used
without a WWW browser (but it still requires internet access).
This
NENTREZ program can also be used in conjunction with Netscape (as a
plug-in) to create a cool 3-D sequence structure browser.
Sequences identified with an ENTREZ search must be copied into a text file on your desktop computer, and then transferred to your RCR account for further work with GCG.
The entire ENTREZ database was distributed on CD-ROM from 1992 until August,
1996, but this was discontinued due to the huge size of the database (which
filled 6 CD's by 1996) and the impossible task of keeping the CDs current with
the rapidly growing database.
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu