next contents next index

Overview of the challenges of Molecular Biology Computing

A. The huge dataset problem
*** Biologists have been very successful in finding the sequences of DNA and protein molecules

*** automated DNA sequencers

***the Human Genome Project

***bulk sequencing of cDNAs (ESTs)
*** Information scientists have had a tough time keeping up with the data

*** Yet the information is being collected, organized, and made available:

*** GenBank is the central sequence information database in the United States

*** Data is shared between GenBank and European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ)

  • All sequence data submitted to any of these databases is automatically integrated into the others.
  • Sequence data is also incorporated from the Genome Sequence Data Base (GSDB) and from patent applications.

*** As of Oct. 1999, GenBank contains over 3.8 billion bases of DNA and protein sequence, which requires about 18 gigabytes of computer disk storage space.

*** The contents of GenBank doubles every year.

***How can computers keep up with the exponential growth of data? Disk storage? Processor speed? Search algorithms?

***The new science of Bioinformatics has been forced to grow up rapidly in order to handle this flood of information.


next contents next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu