next next next index

Protein Analysis

*** Proteins are polypeptides - long linear chains composed of mixed polymers of 20 amino acids. These linear polymers fold upon themselves to generate a shape characteristic of each different protein, and this shape, along with the different chemical properties of the 20 amino acids, determines the function of the protein.

*** Proteins are self-assembling, which means that all of the information necessary to determine the final 3-dimensional structure of the molecule is encoded in its sequence. Thus, in theory, knowing the sequence of a protein one could infer its function.

*** Like DNA, protein sequences have primary, secondary, tertiary, and quaternary structure.

  • The primary structure is the sequence itself - the order of amino acids.

  • The secondary structure refers to local structural elements such as hairpins, helixes, B-pleated sheets, etc.

  • Tertiary structure is the final shape of the complete molecule such as a globular enzyme or a linear actin molecule.

  • Quaternary structure refers to complex structures composed of multiple sub-units, each of which is a distinct polypeptide (encoded by its own gene) or structures composed of mixtures of protein, DNA, and/or RNA (such as ribosomes, histones, or RNA Polymerase II).

*** The tertiary structure of a protein is produced by the folding of a peptide chain back on itself. This folding can occur as rotation around the chemical bonds within the constituent amino acids as well as the bonds that join the amino acids to each other.

*** The number of possible folding patterns for even a small polypeptide chain is effectively infinite.

*** It is not an exaggeration to state that the ability to exactly predict protein structures and, from that, protein function would revolutionize medicine, pharmacology, chemistry and ecology.

*** We are not nearly there yet, but there are some useful tools available.




Amino Acid Analysis

*** Each amino acid in a protein has characteristic properties of charge, pH, and a hydrophobic/hydrophilic tendency.

*** Since similar amino acids tend to cluster, a simple graph of hydrophobicity can be useful in identifying secondary structure elements.

*** In addition to the chemical properties of single amino acids; certain patterns of amino acids are know to form alpha-helixes, B-sheets, and turns.

*** GCG has a complete set of protein analysis tools .

*** The GCG programs PEPTIDESTRUCTURE, ISOELECTRIC, MOMENT, and HELICAL WHEEL all address aspects of the chemical properties of amino acids in proteins.

*** The GCG program PEPPLOT applies a large number of these chemical and simple pattern analyses and produces a nice summary output.


*** Similar amino acid analysis tools are available on the Web.

*** The Protein Hydrophobicity Server at the Weizmann Institute of Science, Israel

*** SAPS Statistical Analysis of Protein Sequences: including composition, charge, hydrophobic and transmembrane segments, cysteine spacings, repeats and periodicity


*** The Macintosh program MacVector has a similar set of protein analysis tools.




Secondary Structure

Protein secondary structure is generally considered within the very limited framwork of three basic shapes: alpha helix, beta sheet, and hairpin turns. Programs that attempt to predict structure assign each amino acid to one of these three categories - generally based on a moving window that considers the chemical properties of neighboring amino acids.

*** The GCG programs PEPPLOT and PEPTIDESTRUCTRE provide Chou-Fasman and Garnier based predictions of protein secondary structure.

*** There are a lot of tools available on the Web that claim to predict secondary structure of proteins. Careful judgement is required in the interpretation of output from custom algorithms.

*** PREDATOR is a secondary structure prediction program that can predict secondary structure of a single sequence, or for a set of related sequences. The mean prediction accuracy of PREDATOR is 68% for a single sequence and 75% for a set of related sequences.

*** SSPRED is a three state secondary structure prediction routine. Polypeptides are compared to the entire SwissProt database, and based on the structure of similar proteins, local regions are assigned to one of three structures: helical, strand, or coil/loop regions.

*** SSCP computes predictions for the content of helix, strand, and coil for a given protein using the amino acid composition as the only input information.

*** STRIDE is a program to recognize secondary structural elements in proteins from their atomic coordinates. It utilizes both hydrogen bond energy and main chain dihedral angles.

*** The BCM Search Launcher provides access to a large collection of other secondary structure prediction tools.



"Super-Secondary" Structure

There are some common structural motifs found in many proteins that have come to be known as "super-secondary" structures. They can reliably be identified by a combination of hydrophobicity, secondary structure and motif based methods. The motifs that are usually grouped in this class are membrane spanning domains, signal peptides, coiled coils, and helix-turn-helix domains.

*** GCG has several programs that identify these "super-secondary" structure elements:
  • COILSCAN identifies coiled-coil domains
  • HTHSCAN identifies helix-turn-helix motifs
  • SPSCAN finds secretory signal peptides


*** There are also a bunch of Web servers that predict these structures:

*** Predict Protein server at the EMBL Heidelberg

*** SOSUI at the Tokyo Univ. of Ag. & Tech., Japan

*** TMpred (transmembrane prediction) at ISREC (Swiss Institute for Experimental Cancer Research)

*** COILS (coiled coil prediction) at ISREC

*** SignalP (signal peptides) at the Tech. Univ. of Denmark


next next next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu