1) Read this essay on "Profile Analysis" by John Devereux (founder of the GCG company)
[http://www.le.ac.uk/cc/mol/GCG/profileanalysis.html]
2) Create a multiple alignment with Clustal using the following protein sequences, then find the most conserved 50 amino acid region using Boxshade or PlotCon.
XYLB_PSEPU
Adh1_Pethy
P32771
P93629
P08319
3) Use your multiple alignment to create a Profile using the EMBOSS Prophecy program.
Now use the EMBOSS Prophet program to search the set of sequences below for similar motifs.
4) Uset the EMBOSS tools ehmmbuild and ehmmsearch to create a profile Hidden Markov Model and search the set of sequences below (a mini-database).
Compare the performance of the Profile vs HMM methods.
[Given sufficient CPU power you can search entire genomes (proteomes) or complete protein databases with novel profiles]
5) Create a web logo for your motif using first the set of aligned query sequences that were used to build the profile (known ADH genes),
and then a different web logo with the matching regions from the set of sequences below.
>gi|38605526|sp|Q9X0G8.1|PRMA_THEMA
RecName: Full=Ribosomal protein L11 methyltransferase; Short=L11 Mtase
MRFKELILPLKIEEEELVEKFYEEGFFNFAIEEDKKGKKVLKIYLREGEPLPDFLKDWEIVDEKITTPKD
WIVELEPFEIVEGIFIDPTEKINRRDAIVIKLSPGVAFGTGLHPTTRMSVFFLKKYLKEGNTVLDVGCGT
GILAIAAKKLGASRVVAVDVDEQAVEVAEENVRKNDVDVLVKWSDLLSEVEGTFDIVVSNILAEIHVKLL
EDVNRVTHRDSMLILSGIVDRKEDMVKRKASEHGWNVLERKQEREWVTLVMKRS
>gi|153937364|ref|YP_001387526.1|
NADH-dependent butanol dehydrogenase [Clostridium botulinum A str. Hall]
MERFTLPRDLYFGEGSLEALKTLKGKKAVVVVGGGSMKRFGFLDKVQSYLNEADIEVKLIEGVEPDPSVE
TVMNGAAVMREFEPDLIVSIGGGSPIDAAKAMWIFYEYPDFTFEQAVVPFGIPELRQKARFVAIPSTSGT
ATEVTAFSVITDYKKKIKYPLADFNLTPDIAIVDPDLAQTMPAKLTAHTGMDALTHAIEAYVAGLRSVFS
DPLAMQAIVMVKEYLVKSYNGNKEARGQMHLAQCLAGMAFSNALLGITHSMAHKTGAVFHIPHGCANAIF
LPYVIQYNSESCGERYATIAKKLGLAGENQDELVKSLIEMIREMNKTMNIPLNLKEYGITEEDFKENVKY
ISHNAVLDACTGSNPREINDETMEKLFACTYYGEDVTF
>gi|416966|sp|Q03132.3|ERYA2_SACER
RecName: Full=Erythronolide synthase, modules 3 and 4; AltName: Full=ORF 2;
AltName: Full=6-deoxyerythronolide B synthase II; AltName: Full=DEBS 2
MTDSEKVAEYLRRATLDLRAARQRIRELESDPIAIVSMACRLPGGVNTPQRLWELLREGGETLSGFPTDR
GWDLARLHHPDPDNPGTSYVDKGGFLDDAAGFDAEFFGVSPREAAAMDPQQRLLLETSWELVENAGIDPH
SLRGTATGVFLGVAKFGYGEDTAAAEDVEGYSVTGVAPAVASGRISYTMGLEGPSISVDTACSSSLVALH
LAVESLRKGESSMAVVGGAAVMATPGVFVDFSRQRALAADGRSKAFGAGADGFGFSEGVTLVLLERLSEA
RRNGHEVLAVVRGSALNQDGASNGLSAPSGPAQRRVIRQALESCGLEPGDVDAVEAHGTGTALGDPIEAN
ALLDTYGRDRDADRPLWLGSVKSNIGHTQAAAGVTGLLKVVLALRNGELPATLHVEEPTPHVDWSSGGVA
LLAGNQPWRRGERTRRARVSAFGISGTNAHVIVEEAPEREHRETTAHDGRPVPLVVSARTTAALRAQAAQ
IAELLERPDADLAGVGLGLATTRARHEHRAAVVASTREEAVRGLREIAAGAATADAVVEGVTEVDGRNVV
FLFPGQGSQWAGMGAELLSSSPVFAGKIRACDESMAPMQDWKVSDVLRQAPGAPGLDRVDVVQPVLFAVM
VSLAELWRSYGVEPAAVVGHSQGEIAAAHVAGALTLEDAAKLVVGRSRLMRSLSGEGGMAAVALGEAAVR
ERLRPWQDRLSVAAVNGPRSVVVSGEPGALRAFSEDCAAEGIRVRDIDVDYASHSPQIERVREELLETTG
DIAPRPARVTFHSTVESRSMDGTELDARYWYRNLRETVRFADAVTRLAESGYDAFIEVSPHPVVVQAVEE
AVEEADGAEDAVVVGSLHRDGGDLSAFLRSMATAHVSGVDIRWDVALPGAAPFALPTYPFQRKRYWLQPA
APAAASDELAYRVSWTPIEKPESGNLDGDWLVVTPLISPEWTEMLCEAINANGGRALRCEVDTSASRTEM
AQAVAQAGTGFRGVLSLLSSDESACRPGVPAGAVGLLTLVQALGDAGVDAPVWCLTQGAVRTPADDDLAR
PAQTTAHGFAQVAGLELPGRWGGVVDLPESVDDAALRLLVAVLRGGGRAEDHLAVRDGRLHGRRVVRASL
PQSGSRSWTPHGTVLVTGAASPVGDQLVRWLADRGAERLVLAGACPGDDLLAAVEEAGASAVVCAQDAAA
LREALGDEPVTALVHAGTLTNFGSISEVAPEEFAETIAAKTALLAVLDEVLGDRAVEREVYCSSVAGIWG
GAGMAAYAAGSAYLDALAEHHRARGRSCTSVAWTPWALPGGAVDDGYLRERGLRSLSADRAMRTWERVLA
AGPVSVAVADVDWPVLSEGFAATRPTALFAELAGRGGQAEAEPDSGPTGEPAQRLAGLSPDEQQENLLEL
VANAVAEVLGHESAAEINVRRAFSELGLDSLNAMALRKRLSASTGLRLPASLVFDHPTVTALAQHLRARL
VGDADQAAVRVVGAADESEPIAIVGIGCRFPGGIGSPEQLWRVLAEGANLTTGFPADRGWDIGRLYHPDP
DNPGTSYVDKGGFLTDAADFDPGFFGITPREALAMDPQQRLMLETAWEAVERAGIDPDALRGTDTGVFVG
MNGQSYMQLLAGEAERVDGYQGLGNSASVLSGRIAYTFGWEGPALTVDTACSSSLVGIHLAMQALRRGEC
SLALAGGVTVMSDPYTFVDFSTQRGLASDGRCKAFSARADGFALSEGVAALVLEPLSRARANGHQVLAVL
RGSAVNQDGASNGLAAPNGPSQERVIRQALAASGVPAADVDVVEAHGTGTELGDPIEAGALIATYGQDRD
RPLRLGSVKTNIGHTQAAAGAAGVIKVVLAMRHGMLPRSLHADELSPHIDWESGAVEVLREEVPWPAGER
PRRAGVSSFGVSGTNAHVIVEEAPAEQEAARTERGPLPFVLSGRSEAVVAAQARALAEHLRDTPELGLTD
AAWTLATGRARFDVRAAVLGDDRAGVCAELDALAEGRPSADAVAPVTSAPRKPVLVFPGQGAQWVGMARD
LLESSEVFAESMSRCAEALSPHTDWKLLDVVRGDGGPDPHERVDVLQPVLFSIMVSLAELWRAHGVTPAA
VVGHSQGEIAAAHVAGALSLEAAAKVVALRSQVLRELDDQGGMVSVGASRDELETVLARWDGRVAVAAVN
GPGTSVVAGPTAELDEFFAEAEAREMKPRRIAVRYASHSPEVARIEDRLAAELGTITAVRGSVPLHSTVT
GEVIDTSAMDASYWYRNLRRPVLFEQAVRGLVEQGFDTFVEVSPHPVLLMAVEETAEHAGAEVTCVPTLR
REQSGPHEFLRNLLRAHVHGVGADLRPAVAGGRPAELPTYPFEHQRFWPRPHRPADVSALGVRGAEHPLL
LAAVDVPGHGGAVFTGRLSTDEQPWLAEHVVGGRTLVPGSVLVDLALAAGEDVGLPVLEELVLQRPLVLA
GAGALLRMSVGAPDESGRRTIDVHAAEDVADLADAQWSQHATGTLAQGVAAGPRDTEQWPPEDAVRIPLD
DHYDGLAEQGYEYGPSFQALRAAWRKDDSVYAEVSIAADEEGYAFHPVLLDAVAQTLSLGALGEPGGGKL
PFAWNTVTLHASGATSVRVVATPAGADAMALRVTDPAGHLVATVDSLVVRSTGEKWEQPEPRGGEGELHA
LDWGRLAEPGSTGRVVAADASDLDAVLRSGEPEPDAVLVRYEPEGDDPRAAARHGVLWAAALVRRWLEQE
ELPGATLVIATSGAVTVSDDDSVPEPGAAAMWGVIRCAQAESPDRFVLLDTDAEPGMLPAVPDNPQLALR
GDDVFVPRLSPLAPSALTLPAGTQRLVPGDGAIDSVAFEPAPDVEQPLRAGEVRVDVRATGVNFRDVLLA
LGMYPQKADMGTEAAGVVTAVGPDVDAFAPGDRVLGLFQGAFAPIAVTDHRLLARVPDGWSDADAAAVPI
AYTTAHYALHDLAGLRAGQSVLIHAAAGGVGMAAVALARRAGAEVLATAGPAKHGTLRALGLDDEHIASS
RETGFARKFRERTGGRGVDVVLNSLTGELLDESADLLAEDGVFVEMGKTDLRDAGDFRGRYAPFDLGEAG
DDRLGEILREVVGLLGAGELDRLPVSAWELGSAPAALQHMSRGRHVGKLVLTQPAPVDPDGTVLITGGTG
TLGRLLARHLVTEHGVRHLLLVSRRGADAPGSDELRAEIEDLGASAEIAACDTADRDALSALLDGLPRPL
TGVVHAAGVLADGLVTSIDEPAVEQVLRAKVDAAWNLHELTANTGLSFFVLFSSAASVLAGPGQGVYAAA
NESLNALAALRRTRGLPAKALGWGLWAQASEMTSGLGDRIARTGVAALPTERALALFDSALRRGGEVVFP
LSINRSALRRAEFVPEVLRGMVRAKLRAAGQAEAAGPNVVDRLAGRSESDQVAGLAELVRSHAAAVSGYG
SADQLPERKAFKDLGFDSLAAVELRNRLGTATGVRLPSTLVFDHPTPLAVAEHLRDRLFAASPAVDIGDR
LDELEKALEALSAEDGHDDVGQRLESLLRRWNSRRADAPSTSAISEDASDDELFSMLDQRFGGGEDL
>gi|161790|gb|AAC37189.1|
histone H3
MARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALREIRKYQKSTDLLIRKLPFQR
LVRDIAHEFKAELRFQSSAVLALQEAAEAYLVGLFEDTNLCAIHARRVTIMTKDMQLARRIRGERF
>gi|81318496|sp|Q6L743.1|DOIAD_STRKN
RecName: Full=2-deoxy-scyllo-inosamine dehydrogenase; Short=DOIA dehydrogenase
MKALVFHSPEKATFEQRDVPTPRPGEALVHIAYNSICGSDLSLYRGVWHGFGYPVVPGHEWSGTVVEING
ANGHDQSLVGKNVVGDLTCACGNCAACGRGTPVLCENLQELGFTKDGACAEYMTIPVDNLRPLPDALSLR
SACQVEPLAVALNAVSIAGVAPGDRVAVMGAGGIGLMLMQVARHLGGEVTVVSEPVAERRAVAGQLGATE
LCSAEPGQLAELVARRPELTPDVVLEASGYPAALQEAIEVVRPGGRIGLIGYRVEETGPMSPQHIAVKAL
TLRGSLGPGGRFDDAVELLAKGDDIAVEPLLSHEFGLADYATALDLALSRTNGNVRSFFNLRD