![]()
![]()
![]()
![]()
The Smith-Waterman Algorithm
If your database searching results with FASTA and BLAST are not satisfactory, and you feel that a more sensitive search is needed (perhaps to build an exhaustive set of distantly related proteins in a protein family), then try a Smith-Waterman search.
For those of you who are mathematically inclined and want to understand how dynamic programming works there is an excellent textbook on the web called Pairwise Sequence Alignment by Robert Giegerich and David Wheeler at the Global Network Academy, Virtual School of Natural Sciences.
This is one chapter of the VSNS Biocomputing Hypertext Coursebook , which contains several other interesting tutorials.
All I'm going to say about how it works is this:
Dynamic Programming is a very general programming technique. It is applicable when a large search space can be structured into a succession of stages, such that: -the initial stage contains trivial solutions to sub-problems -each partial solution in a later stage can be calculated by recurring a fixed number of partial solutions in an earlier stage -the final stage contains the overall solutionHere are some figures that summarize how it works.
![]()
![]()
![]()
![]()
![]()
A certain similarity to the dot plot is obvious.
The rigorous Smith-Waterman dynamic programming algorithm for calculating similarity is now available for database searching within GCG v.10. The program is called SSEARCH. Use it with care - it is *VERY SLOW* and uses a *LOT OF COMPUTER POWER*.
However, several optimized versions of this search method are available over the internet.
The EMBL offers a service know as BLITZ - which actually runs an algorithm called MPsrch on a dedicated MassPar massively parallel super-computer.
- Sequences can be submitted via a WWW page and results are returned by e-mail.
- BLITZ can only be used to search protein sequences against the SwissProt database.
Smith-Waterman searches can also be run on an e-mail server known as the BIOCCELERATOR at the Weizmann Institute of Science. The address of the mail server is:
bicserv@sgbcd.weizmann.ac.ilAvailable databases are:
GenBank Release 88.0 ( 3/95) EMBL Release 42.0 ( 3/95) PIR-Protein Release 44.0 ( 4/95) SWISS-PROT Release 31.0 ( 2/95)
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D, RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu