1) Read this essay on "Profile Analysis" by John Devereux (founder of the GCG company)

 [http://www.le.ac.uk/cc/mol/GCG/profileanalysis.html]

 

2) Create a multiple alignment with Clustal using the following protein sequences, then find the most conserved 50 amino acid region using Boxshade or PlotCon.

 

XYLB_PSEPU 

Adh1_Pethy

P32771

P93629

P08319

 

3) Use your multiple alignment to create a Profile using the EMBOSS Prophecy program.

Now use the EMBOSS Prophet program to search the set of sequences below for similar motifs.

 

4) Uset the EMBOSS tools ehmmbuild and ehmmsearch to create a profile  Hidden Markov Model and search the set of sequences below (a mini-database).

Compare the performance of the Profile vs HMM methods. 

 

[Given sufficient CPU power you can search entire genomes (proteomes) or complete protein databases with novel profiles]

 

 

5) Create a web logo for your motif using first the set of aligned query sequences that were used to build the profile (known ADH genes),

and then a different web logo with the matching regions from the set of sequences below.

 

 

>gi|38605526|sp|Q9X0G8.1|PRMA_THEMA RecName: Full=Ribosomal protein L11 methyltransferase; Short=L11 Mtase

MRFKELILPLKIEEEELVEKFYEEGFFNFAIEEDKKGKKVLKIYLREGEPLPDFLKDWEIVDEKITTPKD

WIVELEPFEIVEGIFIDPTEKINRRDAIVIKLSPGVAFGTGLHPTTRMSVFFLKKYLKEGNTVLDVGCGT

GILAIAAKKLGASRVVAVDVDEQAVEVAEENVRKNDVDVLVKWSDLLSEVEGTFDIVVSNILAEIHVKLL

EDVNRVTHRDSMLILSGIVDRKEDMVKRKASEHGWNVLERKQEREWVTLVMKRS

>gi|153937364|ref|YP_001387526.1| NADH-dependent butanol dehydrogenase [Clostridium botulinum A str. Hall]

MERFTLPRDLYFGEGSLEALKTLKGKKAVVVVGGGSMKRFGFLDKVQSYLNEADIEVKLIEGVEPDPSVE

TVMNGAAVMREFEPDLIVSIGGGSPIDAAKAMWIFYEYPDFTFEQAVVPFGIPELRQKARFVAIPSTSGT

ATEVTAFSVITDYKKKIKYPLADFNLTPDIAIVDPDLAQTMPAKLTAHTGMDALTHAIEAYVAGLRSVFS

DPLAMQAIVMVKEYLVKSYNGNKEARGQMHLAQCLAGMAFSNALLGITHSMAHKTGAVFHIPHGCANAIF

LPYVIQYNSESCGERYATIAKKLGLAGENQDELVKSLIEMIREMNKTMNIPLNLKEYGITEEDFKENVKY

ISHNAVLDACTGSNPREINDETMEKLFACTYYGEDVTF

>gi|416966|sp|Q03132.3|ERYA2_SACER RecName: Full=Erythronolide synthase, modules 3 and 4; AltName: Full=ORF 2; AltName: Full=6-deoxyerythronolide B synthase II; AltName: Full=DEBS 2

MTDSEKVAEYLRRATLDLRAARQRIRELESDPIAIVSMACRLPGGVNTPQRLWELLREGGETLSGFPTDR

GWDLARLHHPDPDNPGTSYVDKGGFLDDAAGFDAEFFGVSPREAAAMDPQQRLLLETSWELVENAGIDPH

SLRGTATGVFLGVAKFGYGEDTAAAEDVEGYSVTGVAPAVASGRISYTMGLEGPSISVDTACSSSLVALH

LAVESLRKGESSMAVVGGAAVMATPGVFVDFSRQRALAADGRSKAFGAGADGFGFSEGVTLVLLERLSEA

RRNGHEVLAVVRGSALNQDGASNGLSAPSGPAQRRVIRQALESCGLEPGDVDAVEAHGTGTALGDPIEAN

ALLDTYGRDRDADRPLWLGSVKSNIGHTQAAAGVTGLLKVVLALRNGELPATLHVEEPTPHVDWSSGGVA

LLAGNQPWRRGERTRRARVSAFGISGTNAHVIVEEAPEREHRETTAHDGRPVPLVVSARTTAALRAQAAQ

IAELLERPDADLAGVGLGLATTRARHEHRAAVVASTREEAVRGLREIAAGAATADAVVEGVTEVDGRNVV

FLFPGQGSQWAGMGAELLSSSPVFAGKIRACDESMAPMQDWKVSDVLRQAPGAPGLDRVDVVQPVLFAVM

VSLAELWRSYGVEPAAVVGHSQGEIAAAHVAGALTLEDAAKLVVGRSRLMRSLSGEGGMAAVALGEAAVR

ERLRPWQDRLSVAAVNGPRSVVVSGEPGALRAFSEDCAAEGIRVRDIDVDYASHSPQIERVREELLETTG

DIAPRPARVTFHSTVESRSMDGTELDARYWYRNLRETVRFADAVTRLAESGYDAFIEVSPHPVVVQAVEE

AVEEADGAEDAVVVGSLHRDGGDLSAFLRSMATAHVSGVDIRWDVALPGAAPFALPTYPFQRKRYWLQPA

APAAASDELAYRVSWTPIEKPESGNLDGDWLVVTPLISPEWTEMLCEAINANGGRALRCEVDTSASRTEM

AQAVAQAGTGFRGVLSLLSSDESACRPGVPAGAVGLLTLVQALGDAGVDAPVWCLTQGAVRTPADDDLAR

PAQTTAHGFAQVAGLELPGRWGGVVDLPESVDDAALRLLVAVLRGGGRAEDHLAVRDGRLHGRRVVRASL

PQSGSRSWTPHGTVLVTGAASPVGDQLVRWLADRGAERLVLAGACPGDDLLAAVEEAGASAVVCAQDAAA

LREALGDEPVTALVHAGTLTNFGSISEVAPEEFAETIAAKTALLAVLDEVLGDRAVEREVYCSSVAGIWG

GAGMAAYAAGSAYLDALAEHHRARGRSCTSVAWTPWALPGGAVDDGYLRERGLRSLSADRAMRTWERVLA

AGPVSVAVADVDWPVLSEGFAATRPTALFAELAGRGGQAEAEPDSGPTGEPAQRLAGLSPDEQQENLLEL

VANAVAEVLGHESAAEINVRRAFSELGLDSLNAMALRKRLSASTGLRLPASLVFDHPTVTALAQHLRARL

VGDADQAAVRVVGAADESEPIAIVGIGCRFPGGIGSPEQLWRVLAEGANLTTGFPADRGWDIGRLYHPDP

DNPGTSYVDKGGFLTDAADFDPGFFGITPREALAMDPQQRLMLETAWEAVERAGIDPDALRGTDTGVFVG

MNGQSYMQLLAGEAERVDGYQGLGNSASVLSGRIAYTFGWEGPALTVDTACSSSLVGIHLAMQALRRGEC

SLALAGGVTVMSDPYTFVDFSTQRGLASDGRCKAFSARADGFALSEGVAALVLEPLSRARANGHQVLAVL

RGSAVNQDGASNGLAAPNGPSQERVIRQALAASGVPAADVDVVEAHGTGTELGDPIEAGALIATYGQDRD

RPLRLGSVKTNIGHTQAAAGAAGVIKVVLAMRHGMLPRSLHADELSPHIDWESGAVEVLREEVPWPAGER

PRRAGVSSFGVSGTNAHVIVEEAPAEQEAARTERGPLPFVLSGRSEAVVAAQARALAEHLRDTPELGLTD

AAWTLATGRARFDVRAAVLGDDRAGVCAELDALAEGRPSADAVAPVTSAPRKPVLVFPGQGAQWVGMARD

LLESSEVFAESMSRCAEALSPHTDWKLLDVVRGDGGPDPHERVDVLQPVLFSIMVSLAELWRAHGVTPAA

VVGHSQGEIAAAHVAGALSLEAAAKVVALRSQVLRELDDQGGMVSVGASRDELETVLARWDGRVAVAAVN

GPGTSVVAGPTAELDEFFAEAEAREMKPRRIAVRYASHSPEVARIEDRLAAELGTITAVRGSVPLHSTVT

GEVIDTSAMDASYWYRNLRRPVLFEQAVRGLVEQGFDTFVEVSPHPVLLMAVEETAEHAGAEVTCVPTLR

REQSGPHEFLRNLLRAHVHGVGADLRPAVAGGRPAELPTYPFEHQRFWPRPHRPADVSALGVRGAEHPLL

LAAVDVPGHGGAVFTGRLSTDEQPWLAEHVVGGRTLVPGSVLVDLALAAGEDVGLPVLEELVLQRPLVLA

GAGALLRMSVGAPDESGRRTIDVHAAEDVADLADAQWSQHATGTLAQGVAAGPRDTEQWPPEDAVRIPLD

DHYDGLAEQGYEYGPSFQALRAAWRKDDSVYAEVSIAADEEGYAFHPVLLDAVAQTLSLGALGEPGGGKL

PFAWNTVTLHASGATSVRVVATPAGADAMALRVTDPAGHLVATVDSLVVRSTGEKWEQPEPRGGEGELHA

LDWGRLAEPGSTGRVVAADASDLDAVLRSGEPEPDAVLVRYEPEGDDPRAAARHGVLWAAALVRRWLEQE

ELPGATLVIATSGAVTVSDDDSVPEPGAAAMWGVIRCAQAESPDRFVLLDTDAEPGMLPAVPDNPQLALR

GDDVFVPRLSPLAPSALTLPAGTQRLVPGDGAIDSVAFEPAPDVEQPLRAGEVRVDVRATGVNFRDVLLA

LGMYPQKADMGTEAAGVVTAVGPDVDAFAPGDRVLGLFQGAFAPIAVTDHRLLARVPDGWSDADAAAVPI

AYTTAHYALHDLAGLRAGQSVLIHAAAGGVGMAAVALARRAGAEVLATAGPAKHGTLRALGLDDEHIASS

RETGFARKFRERTGGRGVDVVLNSLTGELLDESADLLAEDGVFVEMGKTDLRDAGDFRGRYAPFDLGEAG

DDRLGEILREVVGLLGAGELDRLPVSAWELGSAPAALQHMSRGRHVGKLVLTQPAPVDPDGTVLITGGTG

TLGRLLARHLVTEHGVRHLLLVSRRGADAPGSDELRAEIEDLGASAEIAACDTADRDALSALLDGLPRPL

TGVVHAAGVLADGLVTSIDEPAVEQVLRAKVDAAWNLHELTANTGLSFFVLFSSAASVLAGPGQGVYAAA

NESLNALAALRRTRGLPAKALGWGLWAQASEMTSGLGDRIARTGVAALPTERALALFDSALRRGGEVVFP

LSINRSALRRAEFVPEVLRGMVRAKLRAAGQAEAAGPNVVDRLAGRSESDQVAGLAELVRSHAAAVSGYG

SADQLPERKAFKDLGFDSLAAVELRNRLGTATGVRLPSTLVFDHPTPLAVAEHLRDRLFAASPAVDIGDR

LDELEKALEALSAEDGHDDVGQRLESLLRRWNSRRADAPSTSAISEDASDDELFSMLDQRFGGGEDL

>gi|161790|gb|AAC37189.1| histone H3

MARTKQTARKSTGAKAPRKQLASKAARKSAPATGGIKKPHRFRPGTVALREIRKYQKSTDLLIRKLPFQR

LVRDIAHEFKAELRFQSSAVLALQEAAEAYLVGLFEDTNLCAIHARRVTIMTKDMQLARRIRGERF

>gi|81318496|sp|Q6L743.1|DOIAD_STRKN RecName: Full=2-deoxy-scyllo-inosamine dehydrogenase; Short=DOIA dehydrogenase

MKALVFHSPEKATFEQRDVPTPRPGEALVHIAYNSICGSDLSLYRGVWHGFGYPVVPGHEWSGTVVEING

ANGHDQSLVGKNVVGDLTCACGNCAACGRGTPVLCENLQELGFTKDGACAEYMTIPVDNLRPLPDALSLR

SACQVEPLAVALNAVSIAGVAPGDRVAVMGAGGIGLMLMQVARHLGGEVTVVSEPVAERRAVAGQLGATE

LCSAEPGQLAELVARRPELTPDVVLEASGYPAALQEAIEVVRPGGRIGLIGYRVEETGPMSPQHIAVKAL

TLRGSLGPGGRFDDAVELLAKGDDIAVEPLLSHEFGLADYATALDLALSRTNGNVRSFFNLRD