![]()
![]()
![]()
![]()
Computer Software for Phylogenetics
Due to the lack of consensus among evolutionary biologists about basic principles for phylogenetic analysis, it is not surprising that there is a wide array of computer software available for this purpose.
Different theoretical models, different algorithms, different computer platforms, and different interfaces lead to a bewildering array of products from which to choose.
GCG offers two distinct sets of programs for phylogenetic analysis.
There is a very simple distance-based clustering algorithm called DISTANCES (discussed in the section above about UPGMA analysis).
For more complex cladistic analysis, GCG version 9.1, (November, 1997) now includes the complete PAUP program (Phylogenetic Analysis Using Parsimony).
Starting with a set of aligned sequences; PAUP can search for phylogenetic trees that are optimal according to parsimony, distance, or maximum likelihood criteria using heuristic, branch-and-bound or exhaustive tree searching methods
All program functions of PAUP have been divided among two GCG programs:
PAUPSEARCH which calculates trees
and PAUPDISPLAY which produces graphical versions of PAUPSearch tree files
PAUPSEARCH is the single most complex GCG program with approximately 80 different optional command line parameters, some of which have many settings.
PAUP can reconstruct a neighbor-joining tree; or perform a bootstrap analysis.
PAUP can consume huge amounts of computer time.
- The exhaustive or branch-and-bound searches simply cannot be done for more than about 10 sequences of moderate length.
- Maximum likelihood analysis requires al least 60 times more computations than parsimony and distance methods.
In order to permit a wider range of phylogenetic analyses, the RCR has also acquired the PHYLIP package.
I am familiar with some other Phylogeny programs for Macs and PCs. While this is not an endorsement of these programs, within their limited range of functions, they work.
MacClade program (commercial software) written by Wayne P. Maddison and David R. Maddison [http://phylogeny.arizona.edu/macclade/macclade.html] for the Mac.
PAUP (Mac, DOS, UNIX) by David Swofford
MEGA (DOS) by Sudhir Kumar, Koichiro Tamura, and Masatoshi Nei (the originator of the Neighbor Joining algorithm)
NTSYS (DOS) by F. James Rohlf
Joseph Felsenstein (author of PHYLIP) maintains a comprehensive list of Phylogeny programs at http://evolution.genetics.washington.edu/phylip/software.html
PHYLIP
PHYLIP, like GCG is a collection of many programs that perform specific functions including UPGMA, Neighbor Joining, Parsimony, and Maximum Likelihood algorithms as well as tools to manipulate DNA sequences, calculate distance matrixes, and draw tree figures.
PHYLIP programs can be used on the RCR Alpha by typing "PHYLIP" and then the name of the program desired. More information and documentation for PHYLIP programs is available on the RCR's Web site.
You can also obtain a copy of PHYLIP that can be run on a Mac or PC - but this can be quite slow for large data sets.
Phylip documentation is available on the RCR's PHYLIP web pages: http://www.med.nyu.edu/rcr/rcr/phylip/
PHYLIP (Phylogeny Inference Package) by Joseph FelsensteinThis is a FREE package of programs for inferring phylogenies and carrying out certain related tasks.
At present it contains 31 programs, which carry out different algorithms on different kinds of data. The programs in the package are:
Programs for molecular sequence data
PROTPARS: Protein parsimony DNAPARS: Parsimony method for DNA DNAMOVE: Interactive DNA parsimony DNAPENNY: Branch and bound for DNA DNACOMP: Compatibility for DNA DNAINVAR: Phylogenetic invariants DNAML: Maximum likelihood method DNAMLK: DNA ML with molecular clock DNADIST: Distances from sequences PROTDIST Distances from proteins RESTML: ML for restriction sites SEQBOOT: Bootstraps sequence data sets COALLIKE: Coalescent likelihoods from sampled phylogeny estimatesPrograms for distance matrix data
FITCH: Fitch-Margoliash and least-squares methods KITSCH: Fitch-Margoliash and least squares methods with evolutionary clock NEIGHBOR: Neighbor-joining and UPGMA methodsPrograms for gene frequencies and continuous characters
CONTML Maximum likelihood method GENDIST Computes genetic distances CONTRAST Computes contrasts and correlations for comparative method studiesPrograms for discrete state data (0/1)
MIX Wagner, Camin-Sokal, and mixed parsimony criteria MOVE Interactive Wagner, C-S, mixed parsimony program PENNY Finds all most parsimonious trees by branch-and-bound DOLLOP, DOLMOVE, DOLPENNY same as preceding four programs, but for the Dollo and polymorphism parsimony criteria CLIQUE Compatibility method FACTOR re-code multistate charactersPrograms for plotting trees and consensus trees
DRAWGRAM: Draws cladograms and phenograms on screens, plotters and printers DRAWTREE: Draws unrooted phylogenies on screens, plotters and printers CONSENSE: Majority-rule and strict consensus trees RETREE: Reroots, changes names and branch lengths, and flips trees
Conclusions
Given this huge variety of methods for computing phylogenies, how can the biologist determine what is the "correct" method for analyzing a given data set?
Published papers that attempt to address phylogenetic issues generally make use of many different algorithms and data sets in order to support their conclusions.
In some cases different methods of analysis can work synergistically.
Neighbor Joining methods generally produce just one tree, which can help to validate a parsimony or maximum likelihood method if that tree is also present among the possible choices.
Bootstrapping methods such as recalculating with random sequences deleted from the sample set can give and indication of the robustness of a given conclusion.
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu