![]()
![]()
![]()
![]()
FASTA Format
FASTA format is a compact and simple method of storing DNA and protein sequences as text files that can be read by virtually all molecular biology programs including (finally!) GCG version 9.
A sequence in FASTA format begins with a single-line description (or header), followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVKSequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, lower-case letters are are equivalent to upper-case. Some, but not all programs that accept 'FASTA Format" recognize a hyphen or dash (-) to represent a gap of indeterminate length and an asterix (*) to represent an unknown or ambiguous character.
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D, RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu