Frequently Asked Questions

PHYLIP Documentation

(Phylogeny Inference Package)

version 3.5c
by Joseph Felsenstein



(c) Copyright 1986-1993 by Joseph Felsenstein and the University of Washington.
Written by Joseph Felsenstein. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.





Frequently Asked Questions


"It doesn't work! IT DOESN'T WORK!! It says 'can't find infile'" Actually, it's working just fine. Many of the programs look for an input file called "infile", and if one of that name is not present in the current directory, they then ask you to type in the name of the input file. That's all that it's doing.


"The program reads my data file and then says it's out of memory!" This is what tends to happen if there is a problem with the format of the data file, so that the programs get confused and think they need to set aside memory for 1,000,000 species or so. The result is a "memory allocation error". Check the data file format against the documentation: make sure that the data files have not been saved in the format of your word processor (such as Microsoft Word) but in a "flat ASCII" or "text only" mode.


"Consense gives wierd branch lengths! How do I get more reasonable ones?" Consense gives branch lengths which are simply the numbers of replicates that support the branch. This is not a good reflection of how long those branches are estimated to be. The best way to put better branch lengths on a consensus tree is to use it as a User Tree in a program that will estimate branch lengths for it. You may need to convert it to being an unrooted tree, using Retree, first. If the original program you were using was a parsimony program, which does not estimate branch lengths, you may instead have to make some distances between your species (using, for example, DnaDist), and use Fitch to put branch lengths on the user tree.


"DRAWTREE (or DRAWGRAM) doesn't work: it can't find the font file!" Six font files, called FONT1 through FONT6, are distributed with the executables (and with the source code too). The program looks for a copy of one of them called "fontfile". If you haven't made such a copy called "fontfile" it then asks you for the name of the font file. If they are in the current directory, just type one of "font1" through "font6".


"DNAML won't read the input tree created by DNAPARS!" That's because the DNAPARS tree file is a rooted tree, and DNAML wants an unrooted tree. Try using RETREE to change the file to be an unrooted tree file."


"Can DrawGram draw a scale beside the tree? Print the branch lengths as numbers?" It can't do either of these. Doing so would make the program more complex, and it is not obvious how to fit the branch length numbers into a tree that has many very short internal branches. If you want these scales or numbers, choose an output plot file format (such as PICT or PCX) that can be read by a drawing program such as Freehand, Adobe Illustrator, CorelDraw, or MacDraw. Then you can add the scales and branch length numbers yourself by hand. Note the menu option in DrawTree and DrawGram that specifies the tree size to be a given number of centimeters per unit branch length.


"What file format do I use for the sequences?"

"How do I use the programs? I can't find any documentation!"

These are discussed in the documentation files. Do you have them? They are in a separate archive from the executables (they are in the Sources and Documentation archives, which you should definitely fetch). Input file formats are discussed in MAIN.DOC, in SEQUENCE.DOC, DISTANCE.DOC, CONTCHAR.DOC, DISCRETE.DOC, and the documentation files for the individual programs.


"Where can I find out how to infer phylogenies?

There are no books yet, but three review articles that may help are:


"Does the Windows version work under Windows95?" It didn't until mid-November, 1995 but it does now. We have posted a revised version of the executables, compiled with a more recent version of the Watcom C++ compiler, that run well under Windows95, Windows 3.1, and maybe even Windows 3.0. If you fetched the Windows executables before these new versions were posted do fetch them again.


"If I copied PHYLIP from a friend without you knowing, should I try to keep you from finding out?" No. It is to your advantage and mine for you to let me know. If you did not get PHYLIP "officially" from me or from someone authorized by me, but copied a friend's version, you are not in my database of users. You may also have an old version which has since been substantially improved. I don't mind you "bootlegging" PHYLIP (it's free anyway), but you should realize that you may have copied an outdated version. If you are reading this Web page, you can get the latest version just as quickly over Internet. It will help both of us if you get onto my mailing list. If you are on it, then I will give your name to other nearby users when they ask for the names of nearby users, and they are urged to contact you and update your copy. (I benefit by getting a better feel for how many distributions there have been, and having a better mailing list to use to give other users local people to contact). Use the registration form which can be accessed through our Web registration page.


"How do I make a citation to the PHYLIP package in the paper I am writing?" One way is like this:
Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.
or if the editor for whom you are writing insists that the citation must be to a printed publication, you could cite a notice for version 3.2 published in Cladistics:
Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.
For a while a printed version of the PHYLIP documentation was available and one could cite that. This is no longer true. Other than that, this is difficult, because I have never written a paper announcing PHYLIP! My 1985b paper in Evolution on the bootstrap method contains a one-paragraph Appendix describing the availability of this package, and that can also be cited as a reference for the package, although it was distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP is needed mostly to give people something to cite, as word-of-mouth, references in other people's papers, and electronic newsgroup postings have spread the word about PHYLIP's existence quite effectively.


"How do I bootstrap? Why has DNABOOT disappeared?" DNABOOT, BOOT, and DOLBOOT, the previous parsimony-based bootstrap programs, have been removed from the package as there is now a more general way of bootstrapping. It involves running SEQBOOT to make multiple bootstrapped data sets out of your one data set, then running one of the tree-making programs with the Multiple data sets option to analyze them all, then running CONSENSE to make a majority rule consensus tree from the resulting tree file. Read the documentation of SEQBOOT to get further information. Before, only parsimony methods could be bootstrapped. With this new system almost any of the tree-making methods in the package can be bootstrapped. It is somewhat more tedious but you will find it much more rewarding.


"How do I specify a multi-species outgroup with your parsimony programs?" It's not a feature but is not too hard to do in many of the programs. In parsimony programs like MIX, for which the W (Weights) and A (Ancestral states) options are available, and weights can be larger than 1, all you need to do is:

In programs like DNAPARS, you cannot use this method as weights of sites cannot be greater than 1. But you do an analogous trick, by adding a largish number of extra sites to the data, with one nucleotide state ("A") for the ingroup and another ("G") for the outgroup. You will then have to use RETREE to manually reroot the tree in the desired place.


"How do I force certain groups to remain monophyletic in your parsimony programs?" By the same method as in the previous question, using multiple fake characters, any number of groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and DNAMOVE you can specify whatever outgroups you want without going to this trouble.


"How can I reroot one of the trees written out by PHYLIP?" Use the program RETREE. But keep in mind whether the tree inferred by the original program was already rooted, or whether you are free to reroot it.


"Why doesn't NEIGHBOR read my DNA sequences correctly?" Because it wants to have as input a distance matrix, not sequences. You have to use DNADIST to make the distance matrix first.


"What do I do about deletions and insertions in my sequences?" The molecular sequence programs will accept sequences that have gaps (the "-" character). They do various things with them, mostly not optimal. DNAPARS counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G, and T). Each site counts one change when a gap arises or disappears. The disadvantage of this treatment is that a long gap will be overweighted, with one event per gapped site. So a gap of 10 nucleotides will count as being as much evidence as 10 single site nucleotide substitutions. If there are not overlapping gaps, one way to correct this is to recode the first site in the gap as "-" but make all the others be "?" so the gap only counts as one event. Other programs such as DNAML and DNADIST count gaps as equivalent to unknown nucleotides (or unknown amino acids) on the grounds that we don't know what would be there if something were there. This completely leaves out the information from the presence or absence of the gap itself, but does not bias the gapped sequence to be close to or far from other gapped or ungapped sequences.


"Why don't your parsimony programs print out branch lengths?" The long answer is that it is because there are problems defining the branch lengths. If you look closely at the reconstructions of the states of the hypothetical ancestral nodes for almost any data set and almost any parsimony method you will find some ambiguous states on those nodes. There is then usually an ambiguity as to which branch the change is actually on. Other parsimony programs resolve this in one or another arbitrary fashion, sometimes with the user specifying how (for example, methods that push the changes up the tree as far as possible or down it as far as possible). I have preferred to leave it to the user to do this. Few programs available from others currently correct the branch lengths for multiple changes of state that may have overlain each other. One possible way to get branch lengths with nucleotide sequence data is to take the tree topology that you got, use RETREE to convert it to be unrooted, prepare a distance matrix from your data using DNADIST, and then use FITCH with that tree as User Tree and see what branch lengths it estimates.

(The short answer is that we're working on including branch lengths in the next major release of PHYLIP. This will involve using a nice solution of the ambiguity problem by David and Wayne Maddison.)


"Why can't your programs handle unordered multistate characters?" Well, they can if they are 4-state characters whose states are A, C, G, and T (or U) because then one can use the DNA sequence parsimony programs. But in general the discrete characters parsimony programs can only handle two states, 0 and 1. This is mostly because I have not yet had time to modify them to do so -- the modifications would have to be extensive. Ultimately I hope to get these done, but in the meantime the best I can do is suggest that you either use one of the excellent parsimony programs produced by others (PAUP or Hennig86, for example) or if you have four or fewer states recode your states to look like nucleotides and use the parsimony programs in the molecular sequence section of PHYLIP.


"Where can I get a printed version of the PHYLIP documents?" For the moment, you can only get a printed version by printing it yourself. For versions 3.1 to 3.3 a printed version was sold by Christopher Meacham and Tom Duncan, then at the University Herbarium of the University of California at Berkeley. But they have had to discontinue this as it was too much work. You should be able to print out the documentation files on almost any printer and make yourself a printed version of whichever of them you need.


"Why have I been dropped from your newsletter mailing list?" You haven't. The newsletter was dropped. It simply was too hard to mail it out to such a large mailing list. The last issue of the newsletter was Number 9 in May, 1987. The Listserver News Bulletins that we tried for a while have also been dropped as too hard to keep up to date. I am hoping that this World Wide Web site will take their place.


"How many copies of PHYLIP have been distributed?" On 21 November, 1996 we reached 4,000 registered installations worldwide. A year earlier it was 3,000, so that we are registering 1,000 new users per year at the moment. Of course there are many more people who have got copies from friends. PHYLIP is the most widely distributed phylogeny package. (This situation may reverse itself rapidly once PAUP* is released in 1997. During the years it was in distribution, PAUP was ahead in phylogenies published, and the availability of distance and likelihood methods in PAUP* may make it very popular.) In recent years magnetic tape distribution and e-mail distribution of PHYLIP has disappeared, and there has been a big decrease of diskette distributions (down to fewer than one per week). But all this has been more than offset by, first, an explosion of distributions by anonymous ftp over Internet, and then a bigger explosion of World Wide Web distributions and registrations (about 4 registrations per day at the moment).