next next next index

Multiple Alignment


Introduction

*** In theory, making an optimal alignment between two sequences is computationally straightforward (Smith-Waterman algorithm), but aligning a large number of sequence using the same method is almost impossible.

*** The problem increases exponentially with the number of sequences involved.

*** Instead, most of the available multiple alignment programs use some sort of incremental or progressive method that makes pairwise alignments, then adds new sequences one at a time to these aligned groups.

*** I will discuss in detail only PILEUP, the multiple alignment program that is built into the GCG package.

*** A similar methodology is used by the CLUSTALW program [Thompson, J.D., Higgins, D.G. and Gibson, T.J. Nucleic Acids Research, 1994] which is also available on the RCR Alpha.

*** A very thorough mathematical discussion of multiple alignment by Georg Fuellen (University of Bielefeld, Department of Computer Science and Biotechnology) can be found on the web at http://merlin.mbcr.bcm.tmc.edu:8001/bcd/Curric/MulAli/mulali.html



The PILEUP Algorithm

*** PILEUP estimates the best alignment for a group of sequences using a progressive pairwise approach.

*** First, similarity scores are calculated between all sequences to be aligned, and they are clustered into a dendrogram (tree structure) by the neighbor joining method.

*** Then the most similar pairs of sequences are aligned and averages (similar to consensus sequences) are calculated for the aligned pairs.

*** The final multiple alignment is performed by a series of progressive, pairwise alignments between sequences and clusters of sequences, according to the branching order in the dendrogram.



setplot menu

*** Since the alignment is calculated on a progressive basis, the order of the initial sequences can affect the final alignment. In addition, anything that affects the calculation of the dendrogram such as different comparison matrixes or gap weights will also affect the multiple alignment.

*** PILEUP has an option to output a figure of the dendrogram it created. We will talk about working with GCG graphics at the end of this lecture.

*** It's usually a good idea to look at it, just to make sure that the order of alignment makes some sort of sense - it can help you catch misnamed sequences, for instance.

*** That tree may look like a phylogenetic tree - but it isn't.


next next next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu