![]()
![]()
![]()
![]()
Overview of the challenges of Molecular Biology Computing
A. The huge dataset problem
Biologists have been very successful in finding the sequences of DNA and protein molecules
automated DNA sequencers
the Human Genome Project
bulk sequencing of cDNAs (ESTs)
Information scientists have had a tough time keeping up with the data
Yet the information is being collected, organized, and made available:
GenBank is the central sequence information database in the United States
Data is shared between GenBank and European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ)
- All sequence data submitted to any of these databases is automatically integrated into the others.
- Sequence data is also incorporated from the Genome Sequence Data Base (GSDB) and from patent applications.
As of Oct. 1999, GenBank contains over 3.8 billion bases of DNA and protein sequence, which requires about 18 gigabytes of computer disk storage space.
The contents of GenBank doubles every year.
How can computers keep up with the exponential growth of data? Disk storage? Processor speed? Search algorithms?
The new science of Bioinformatics has been forced to grow up rapidly in order to handle this flood of information.
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu