Genomes and Pathways
The Human Genome Sequence has almost reached a useable form for studies of metabolic pathways and gene regulation. It would be extremely valuable to be able to start with a gene (or group of genes) identified by some experimental method P such as gene expression microarrays, yeast two-hybrid screens, etc., and find the metabolic pathways in which it is involved, then identify other genes in the same pathways, and find genomic sequences for all of them. Then study those genomic sequences for transcription factor binding sites or other sequence patterns that are involved in gene regulation.
As a brief exercise, we will find the genes in a very simple human metabolic pathway and compare genomic sequences.
- Start at the KEGG (Kyoto Encyclopedia of Genes and Genomes) website:
http://www.genome.jp/kegg/
Follow the link to "KEGG Gene Universe", then the link for "Pathways"
This will open up a page with a list of many different metabolic functions. Drill down under Amino Acid Metabolism to Lysine Biosynthesis.
Use the pulldown menu to select the 3 human genes (they light up in green). Copy the SwissProt ID and the name for each gene.
- Now go to the Human Genome Browser at UCSC:
genome.ucsc.edu
and look up each of the 3 human genes from KEGG.
- Download the genomic DNA plus 1000 base pairs of 5U upstream sequence ("promoter"). for each gene. While the Genome Browser makes this very easy, there is no guarantee that these sequences are correct. In particular, look at the different complete mRNAs and various ESTs that have been sequences for each gene. Look at the ratio of intron to exon sequence. Look at how close together (or overlapping???) some genes are.
- Now look for transcription factors in these genomic sequences.
Use the TransFac database. You can search for transcription factors using several different programs.
TESS is the simplest:
Or a fancy search with a program called
Alibaba2
- The next step would be to try to come up with your own ideas about bits of DNA that these genes share that have regulatory significance. For starters, you should have more than 3 genes. There are some programs out there that can do this, but beware the results. They will give an answer regardless of whether or not it is at all true. Be particularly cautious about assuming that genes that show a common gene expression pattern or are members of the same metabolic pathway are co-regulated by the same transcription factors. There could be several transcription factors that work in concert on different sub-sets of genes.