Genome Sequence Exercise

An Easy one to get you started

Nature Genetics has published an excellet supplement issue on geneome sequence analysis called "A user's guide to the human genome."

Do Exercise #7

The rest of this issue is also worth a look. In your own time, check it out.
User's Guide to the Human Genome

Now something more challenging to make you think

A member of our School faculty is trying to design a DNA chip with mouse promoter sequences on it. He has chosen about 2,000 genes that are relevant to developmental processes. It is extremely non-trivial to determine the "true" or even a best guess as to the transcription start site for many of these genes.

Here are GenBank accession numbers for 3 genes that have proven to be difficult.

AK017771
XM_130232
NM_008314

Can you find these genes on the UCSC Genome Browser (for Mouse)? It may require getting the DNA sequence from GenBank and using the BLAT alignment tool.

We have also been using the Riken Database of full length mouse cDNA clones. If there is a longer RIken sequence, then where is the true promoter?

Choose a "gene model" that makes the most sense to you for each of these genes and extract 500 bases of genomic sequence directly upstream from your chosen transcription start sites (the first exons).

Once you have chosen a promoter region for each of these three mouse genes, then go to the Human genome and find the orthologs of these genes and again extract your best guess as to the promoter region (500 bp upstream of the first exon).

Now for each pair of orthologs, try to align the promoter regions. Can they be aligned?

Search for transcription factor binding sites (TESS).
Search TESS

Search for consensus eukaryotic promoter elements.
TFSiteScan
Neural Network Promoter Prediction

Can you think of a better way to find promoters for large sets of genes?
Lecture Notes by Shifra Ben-Dor


Stuart Brown - RCR
Last modified: Tue Apr 12 11:58:36 EDT 2005