next next next index

Finding sequences in the Databanks

FETCH

***If you know the exact locus name or accession number of a sequence, GCG provides a simple tool called FETCH.
Keep in mind that FETCH retrieves sequences from the RCR's local copies of GenBank and other databases. Sometimes the names and accession numbers in this database have annoying inconsistancies with those reported in journal articles or by BLAST/ENTREZ searches of the NCBI's databases.
***Just type FETCH and the accession number (or locus name), and in a few seconds the complete database entry for that sequence (sequence plus full annotation) will be copied to your directory.

*** EXAMPLE: > FETCH HSBRCA13P

*** You can also use FETCH to retrieve multiple sequences by giving a "file of filenames" containing a list of sequence accession numbers (or locus names). However, you must use the "@" sign with the name of your list file, for example:

 > FETCH  @seqnames.list 
*** FETCH can also retrieve all of the sequences found in a FASTA search. The output of the FASTA search can be treated as a list file, eg.: > FETCH @frag.fasta

*** FETCH can also be used to retrieve multiple sequences using the VMS wildcard characters "*" and "?" to specify any group of characters and any single character respectively.

For example, most of the human sequences in the EST section of GenBank could be retrieved with the command FETCH gb_est:*hum*.
***WARNING: This example would retrieve FAR TOO MANY SEQUENCES TO FIT IN YOUR DIRECTORY!!]

***However, due to inaccuracies and inconsistencies in the annotation of database entries, this type of general retrieval command will get many sequences that are not what you want and miss some that you do wish to find, it is much better to use a custom tool such as LOOKUP that searches sequence annotation to find a group of related sequences.

***FETCH can also be used in "interactive mode". This is GCG's attempt to make the interface slightly more friendly. If you just type FETCH at the $ prompt and do not specify a sequence name or accessions number, FETCH will ask you:
Fetch copies GCG sequences or data files from the GCG database
into your directory or displays them on your terminal screen.

 FETCH what sequence(s) ?
However, for some obscure reason, the FETCH does not work as well interactively. Certain sequence names are not found when entered at the
FETCH what sequence(s) ? prompt. So it is better to always use FETCH in "command line mode" as shown above.


next next next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu