next next next index

A. DNA Structure

*** The 3 dimensional structure of DNA can be described in terms of primary, secondary, tertiary, and quaternary structure.

*** The primary structure of DNA is the sequence itself - the order of nucleotides in the deoxyribonucleic acid polymer.

*** The sequence alphabet is restricted to only 4 letters (GATC), but these letters must contain:

  • the code specifying the order of amino acids in proteins

  • the punctuation that controls the beginning and end of protein coding sequences and the splicing of introns

  • the regulatory information that specifies when and how much of each protein to make in each cell at various developmental stages

  • instructions for the transcription of RNA molecules that do not encode protein (tRNA, ribosomal RNA)

  • information that controls the replication of the DNA molecule

  • the structural information for the 3-dimensional shape of the DNA molecule itself.

*** The secondary structure of DNA is relatively straightforward - it is a double helix.

*** The tertiary and quaternary structure is less well understood.

*** The double helix is itself supercoiled (with enzymes like DNA gyrase), and it is wrapped around histones.

*** In addition, there are a wide variety of proteins that form complexes with DNA in order to replicate it, transcribe it into RNA, and regulate the transcriptional process.

*** Many, if not all, of these proteins bind to the DNA molecule at specific sequences, so primary sequence determines function.




Scanning for Regulatory Sequences

*** A large number of these regulatory sequences (promoters, enhancers, transcription factor binding sites, and other regulatory elements) have been identified and collected into databases.

*** The best of these databases is called TransFac: The Transcription Factor Database maintained by the German Gesellschaft for Biotechnologische Forschung mbH (GBF). TransFac has 8415 DNA sites w/ known transcriptional regulatory functions and 2785 protein factors known to bind to these sites (as of January, 2000).

*** Another database of DNA signal sequences is the Eukaryotic Promoter Database (EPD).
EPD provides a nearly comprehensive compilation of eukaryotic transcription signals (promoters). All information is directly abstracted from scientific literature. EPD has 1363 entries.
*** Several tools make use of these database to search DNA sequences for the occurrence of known signal sequences. The GCG program FINDPATTERNS can be used with the data file TFSITES.DAT (a local copy of the TransFac database).

First, get the TFSITES.DAT data file with the FETCH command:
$ FETCH TFSITES.DAT
Then use the FIND command:
$ FIND/DATA=TFSITES.DAT yourdna.seq
*** The Signal Scan program (for both Mac and PC) can search a copy of the TransFac (or EPD) database that you keep on your own machine - this has the advantage that you can very easily add new sequences to be searched.

*** Similar programs can be used on the Web at several servers.

*** Signal Scan at the NIH BioInformatics & Molecular Analysis Section (BIMAS)

*** Promoter Scan at NIH BIMAS

*** Promoter Scan II at the Univ. of Minnesota & Axyx Pharmaceuticals

*** The Computational Biology and Informatics Laboratory at the University of Pennsylvania offers a nice service called TESS (Transcription Element Search Software).

*** MatInspector is another tool that use the TransFac database. It can be run interactively over the Web or on Mac, DOS/Windows, and UNIX computers.

*** TargetFinder at the Telethon Inst.of Genetics and Medicine, Milan, Italy


next next next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu