R Exercise

Sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significa nce. In studying gene transcription regulation, splicing regulation etc, one important step is to identify sequence motifs.

identify enrichment sequence motifs in biology sequence sets Function:
  1. Generate all possible DNA/RNA/PEPtide strings of a given length, for example, all possible 6mers, 7mers..etc.
  2. Read DNA/RNA/PEP Sequence into R
  3. Calculate the occurence frequency of the strings in the background reference and forground test sequences
  4. Determine the over-representation/enrichment p-value
  5. *Genome-scale analysis by looping over entries in string database
Example Code

Sample Sequences
Stuart Brown - RCR
Last modified: Fri Apr 11 11:15:32 EDT 2008