next next next index

Cladistic methods

*** Cladistic methods of phylogenetic analysis are based on the explicit assumption that a set of sequences evolved from a common ancestor by a process of mutation and selection without mixing (hybridization or other horizontal gene transfers).

*** Many of these methods work best if a specific tree, or at least an ancestral sequence, is already known so that comparisons can be made between a finite number of alternate trees rather than calculating all possible trees for a given set of sequences.

Parsimony

*** Parsimony is the most popular method for reconstructing ancestral relationships.

*** Parsimony allows the use of all known evolutionary information in tree building
(In contrast, distance methods compress all of the individual differences between pairs of sequences into a single number)
*** Parsimony involves evaluating all possible trees and giving each a score based on the number of evolutionary changes that needed to explain the observed data.

*** The most parsimonious tree is the one that requires the fewest evolutionary changes for all sequences to derive from a common ancestor.

*** This is easiest to explain by example:

Consider four sequences: ATCG, TTCG, ATCC, and TCCG.

  • Imagine a tree that branches once at the first position, thus grouping ATCG and ATCC on one branch, and TTCG and TCCG on the other branch.

  • Then each branch sub-divides into two sub-branches for a total of 3 nodes in the tree (Fig. 1).

  • Counting backward from the bottom, each sequence is separated from the root by two nodes, so the sum of the changes is equal to 8.

  • This is a more parsimonious tree than one that first divides ATCC on its own branch, then splits off ATCG, and finally divides TTCG from TCCG (Fig.2).

  • This tree also has three nodes, but when all of the distances back to the root are summed, the total is equal to 9.


Fig.1 Fig.2
         Fig. 1	                  Fig. 2


B. Maximum Likelihood

*** The method of maximum likelihood attempts to reconstruct a phylogeny using an explicit model of evolution.

*** Certainly, for this given model of evolution, no other method will perform as well nor provide you with as much information about the tree.

*** Unfortunately, this is computationally difficult to do and hence, the model of evolution must be a simple one. Even with simple models of evolutionary change the computational task is enormous and this is thus the slowest of all methods.

*** As an example for a typical model of simple evolutionary change considers a single site in a sequence of nucleotides:

  • Let all sites be selectively neutral and let them spontaneously mutate at a constant rate per gamete per generation.

  • Let the mutation rates to and from each nucleotide be equal.

  • Generations are assumed to be discrete and the evolution of each site is assumed to be independent of all other sites.
(This may seem like a lot of assumptions but in reality most other methods will not work very well without them either).
*** This method really works best when it is used to test (or improve on) an existing tree.

*** Given a particular tree, each branch point (node) is calculated individually for each base in the sequence, starting at the top and working down to the root.

*** First, calculate the most probable ancestor for each neighboring pair of sequences joined by a branch of the tree.

*** Then the sequences at these nodes are used to calculate the most probable ancestral sequences at the next higher level of branch points, and this is continued until an single most probable ancestral sequence is computed at the "root" of the tree.

*** Given this model an explicit statement can be made about the probability of change from one nucleotide to another within a specified time period.

*** Since each nucleotide site evolves independently, the phylogeny can be calculated separately for each site. The product of the likelihood's for each site provides the overall likelihood of the observed data.

*** To maximize the likelihood, different parameters are analyzed until a set of branch lengths/mutation rates are found which provide the highest likelihood of observing the actual sequences.

*** Finally many different tree topologies are searched to find the best one.


next next next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu