DNA and Protein Patterns Exercise

Start with the sequence cecos.seq that I have linked to this page.

If this is a new or unknown sequence, where do you start an analysis?

OK, so most of you know by now that a BLAST or FASTA search is the best place to start any analysis of an unknown sequence, but lets skip that for now and think about pattern recognition. Does this gene match any known patterns.

The first problem is that this is an unknown chunk of DNA, does it have any genes in it? Lets do a quick check for open reading frames.

Try searching this sequence for ORFs using the ORF-Finder online service http://www.ncbi.nlm.nih.gov/gorf/gorf.html
This server has a built in ability to make a BLAST search with the protein translation of each putative ORF.

I would be a lot faster to translate the DNA in all 6 reading frames and make a BLAST search with all of them at once. That is exactly what the "Translated BLAST Searches" are for.

Use BLASTX to translate your DNA query sequence in all six reading frames for comparison to a protein database).

Isn't that an easier way to find the coding sequence in a stretch of DNA? Too bad this doesn't work for all new sequences that you find in the lab.

Take the protein sequence generated by BLASTX and use it for a domain search using the PFam database.
Compare with the result that you get using the ProDom database and look at a multiple alignment of related genes.

If you were new to the study of this gene, this information would probably be valuable.