![]()
![]()
![]()
![]()
Finding sequences in the Databanks
FETCH
If you know the exact locus name or accession number of a sequence, GCG provides a simple tool called FETCH.
Keep in mind that FETCH retrieves sequences from the RCR's local copies of GenBank and other databases. Sometimes the names and accession numbers in this database have annoying inconsistancies with those reported in journal articles or by BLAST/ENTREZ searches of the NCBI's databases.Just type
FETCHand the accession number (or locus name), and in a few seconds the complete database entry for that sequence (sequence plus full annotation) will be copied to your directory.
EXAMPLE:
> FETCH HSBRCA13P
You can also use FETCH to retrieve multiple sequences by giving a "file of filenames" containing a list of sequence accession numbers (or locus names). However, you must use the "@" sign with the name of your list file, for example:
> FETCH @seqnames.listFETCH can also retrieve all of the sequences found in a FASTA search. The output of the FASTA search can be treated as a list file, eg.:
> FETCH @frag.fasta
FETCH can also be used to retrieve multiple sequences using the VMS wildcard characters "*" and "?" to specify any group of characters and any single character respectively.
For example, most of the human sequences in the EST section of GenBank could be retrieved with the commandFETCH gb_est:*hum*.WARNING: This example would retrieve FAR TOO MANY SEQUENCES TO FIT IN YOUR DIRECTORY!!]
However, due to inaccuracies and inconsistencies in the annotation of database entries, this type of general retrieval command will get many sequences that are not what you want and miss some that you do wish to find, it is much better to use a custom tool such as LOOKUP that searches sequence annotation to find a group of related sequences.
FETCH can also be used in "interactive mode". This is GCG's attempt to make the interface slightly more friendly. If you just type
FETCHat the$prompt and do not specify a sequence name or accessions number, FETCH will ask you:
Fetch copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen. FETCH what sequence(s) ?However, for some obscure reason, the FETCH does not work as well interactively. Certain sequence names are not found when entered at the
FETCH what sequence(s) ?prompt. So it is better to always use FETCH in "command line mode" as shown above.
![]()
![]()
![]()
![]()
Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center Comments to: browns02@mcrcr.med.nyu.edu