next index

Finding Sequences in Databases

***In the first lecture of the course, we briefly discussed the huge size of the current public DNA and protein sequence databases.

*** Some current (March, 2000) statistics:
Total # of bases in Genbank = over 6 billion
Total # of sequences = 5.7 million
Total disk space to store GenBank = over 23 GigaBytes


***In order for these databases to be useful, the data must be readily accessible to researchers. This lecture will be devoted to various ways of finding and retrieving information from sequence databases.

*** FETCH is a GCG tool that is useful for grabbing sequences from the RCR's local databases (GenBank, PIR, etc.) when you know an exact name or accession number.

*** NETFETCH is a GCG tool that retrieves sequences directly from GenBank using the NCBI's NetENTREZ web server. It can retrieve single or multiple sequences by name or accession number. It can also retrieve entire sets of sequences found as the results of a NetBLAST search.


*** LOOKUP is a GCG tool that allows fast keyword-based searching of sequence annotations.

*** ENTREZ is a web based search engine for all of GenBank DNA sequences as well as the associated protein sequences and MEDLINE references. [Guest lecture by James W. Beattie and Dorice L. Vieira from the Ehrman Library]

*** Other tools for the search and retrieval of DNA/protein sequences


next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu