1) Find a gene by text-based query
Find the protein sequence for human Voltage Gated Calcium
Channel Alpha Subunit.
Use ENTREZ at NCBI (http://www.ncbi.nlm.nih.gov/).
Choose protein from the ÒSearchÓ pulldown menu.
Type your query terms into the search field. Start with calcium
channel and
click the "Go" button.
Too many records are found, so narrow the search with
another search keyword. In the search field, type alpha and click "Go" again.
It is useful to use the "History" function to
combine queries.
At the top of the window it now shows your current query
followed by some grey fields that allow you to customize the display.
This is where the real power of ENTREZ becomes evident. Each
protein sequence is linked to all corresponding DNA sequences, as well as to
all similar protein sequences (pre-computed with BLAST). Proteins are also
linked to all MEDLINE references that mention that sequence. These linked
sequences and references have their own links, so from virtually any starting
point, you can expand your search horizontally to learn about entire families
of related database sequences.
Follow some of these links to find and save as a text file
the abstract of the following article:
De Jongh, 1990 Subunits of
purified calcium channels. Alpha 2 and delta are encoded by the same gene. J
Biol Chem 265, 14738-41 (1990) [90368635]
2) Much of bioinformatics work involved building data sets. When you want to explore the relationships among various genes, the first step usually involved obtaining sequences for these genes from public databases. If you set of potentially interesting genes is large, then it is very helpful to make use of database Òbatch queryÓ features.
Obtain from NCBI, a single FASTA file that contains protein sequences for the following:
[Hint: you can paste multiple accession numbers into the Search field, or upload a text file.]
NP_000093
NP_523403
NP_476907
NP_726797
3) Systems biology is largely a study of patterns of
regulation of gene expression. In a microarray experiment, the following set of
Affymetrix probe-set IDs correspond to genes that are found to be significantly
up-regulated in tumor vs. normal cells. Obtain 1000 bases of genomic sequence
upstream of the first exon for all of these genes so that we can search for
common patterns of transcription factor binding sites.
Where can you find a database that contains both Affy IDs and genome sequence? Look at the UCSC Genome Browser. How can you make a batch query for Affy IDs in this tool? Study the Table Browser interface: http://genome.ucsc.edu/cgi-bin/hgTables
[One more hint: all of these Affy IDÕs come from chip type
Affy U133 Plus2]
210340_s_at
210340_s_at
206148_at
207288_at
203624_at
203624_at
203624_at
206779_s_at
201029_s_at
201029_s_at
201028_s_at
201909_at
207246_at
233178_at
217049_x_at