R Exercise
Sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significa
nce. In studying gene transcription regulation, splicing regulation etc, one important step is to identify sequence motifs.
identify enrichment sequence motifs in biology sequence sets
Function:
- Generate all possible DNA/RNA/PEPtide strings of a given length, for example, all possible 6mers, 7mers..etc.
- Read DNA/RNA/PEP Sequence into R
- Calculate the occurence frequency of the strings in the background reference and forground test sequences
- Determine the over-representation/enrichment p-value
- *Genome-scale analysis by looping over entries in string database
Example Code
Sample Sequences
Stuart Brown - RCR
Last modified: Fri Apr 11 11:15:32 EDT 2008