Phylogenetics Exercise

We cannot hope to cover the application of phylogenetics to molecular data in any kind of thorough way in this class. The subject requires at least a course of its own, if not a series of courses. The options available just within the PHYLIP program are at least as complex as everything we have learned so far in GCG. The PAUP program adds even more options.

This exercise takes you through the process of creating a list of sequences, aligning them, then using DISTANCES and GROWTREE to do the most basic form of phylogenetic analysis, then we will go once through PAUP using one set of default settings. Phylogenetic programs generate trees as output - which require some kind of graphics format to visualize the results. We can use the command SETPLOT to create a PNG output file, and then bring it to your desktop with FTP. View in web browser or with any graphics viewer on your computer.

LOOKUP

We will start by making a list of similar sequences using FASTA. fetch the human rab11b gene (rb11b_human).

Now use fasta to locate all of the similar genes in SwissProt (sw:*).

Look at the aligned sequences in the FASTA output file. Note that they are all rab11 genes from various species. Now use PILEUP to make a multiple alignment of the sequences directly from the FASTA output file. (you will probably need to limit your alignment to about 20 sequences)

Look at the r11b_human.msf text file created by PileUp. Are all of the sequences about the same length? You can remove any that are much shorter or much longer than the rest and re-run PILEUP on the shorter list of sequences.

Build a Phylogenetic Tree

Once you have a nice-looking multiple alignment file in the MSF format created by PileUp, you are ready for phylogenetic analysis. We will start with the simple program DISTANCES to calculate the number of amino acid differences between the sequences in the alingment, and then use GROWTREE to create a tree from this data. Lets keep this simple and just use all the defaults for these programs.

Send the figure to me as an e-mail attachment.

PAUP

You can create more accurate phylogentic trees using the program PAUP (Phylogenetic Analysis Using Parsimony).

Again, use the r11b_human.msf alignment as your starting point. Always use the "Heuristic tree search" option (or else the computation will take forever). Parsimony or Distance will give slightly (or dramatically) different results, but I'm not qualified to guess which is "correct" for any given set of sequences. When in doubt, go with the default. We won't be studying all the complexities of Maximum Likelihood in this class.

You will also need to run the PaupDisplay program to create a graphic output. The output is saved as a file in your directory: rb11_human.paupdisplay. Send me this file by e-mail.

ClustalW & Phylodendron

You can also make Neighbor Joining trees (distance method) using Clustal. We have clustalw installed on mcrcr0, or you can install ClustalX on any Mac or Windows PC. It is also available on the web: http://www.ebi.ac.uk/clustalw/

The proceedure is to load your sequences as a multi-sequence Fasta file, make a multiple alignment, then make a phylogenetic tree. On the EBI webserever, first make your alignment with the "Phylogenetic Tree Type" pulldown menu set to "none." Then copy the alignment and paste it back into the box for sequences, and choose the "nj" option for "Tree Type"; also choose the "on" options for "Correct Dist." and "Ignore Gaps". Many versions of Clustal produce a tree that is just a set of numbers. This can then be built into a graphic by a program like GrowTree, or the webserver called Phylodendron.

http://iubio.bio.indiana.edu/treeapp/treeprint-form.html

*Note - you can convert a FASTA output file into a set of sequences in Fasta format using the GCG command tofasta. For the rab11 search, the command looks like this:

 tofasta @rb11_human.fasta