1) Find a gene by text-based query

 

Find the protein sequence for human Voltage Gated Calcium Channel Alpha Subunit.

 

Use ENTREZ at NCBI (http://www.ncbi.nlm.nih.gov/). Choose protein from the ÒSearchÓ pulldown menu.

Type your query terms into the search field. Start with calcium channel and click the "Go" button.

 

Too many records are found, so narrow the search with another search keyword. In the search field, type alpha and click "Go" again.

It is useful to use the "History" function to combine queries.

 

At the top of the window it now shows your current query followed by some grey fields that allow you to customize the display.

 

This is where the real power of ENTREZ becomes evident. Each protein sequence is linked to all corresponding DNA sequences, as well as to all similar protein sequences (pre-computed with BLAST). Proteins are also linked to all MEDLINE references that mention that sequence. These linked sequences and references have their own links, so from virtually any starting point, you can expand your search horizontally to learn about entire families of related database sequences.

 

Follow some of these links to find and save as a text file the abstract of the following article: 
De Jongh, 1990 Subunits of purified calcium channels. Alpha 2 and delta are encoded by the same gene. J Biol Chem 265, 14738-41 (1990) [90368635]

 

 

2) Much of bioinformatics work involved building data sets. When you want to explore the relationships among various genes, the first step usually involved obtaining sequences for these genes from public databases.  If you set of potentially interesting genes is large, then it is very helpful to make use of database Òbatch queryÓ features.

 

Obtain from NCBI, a single FASTA file that contains protein sequences for the following:

 

[Hint: you can paste multiple accession numbers into the Search field, or upload a text file.]

            NP_000093

            NP_523961

            NP_523749

            NP_523748

            NP_523403

            NP_728191

            NP_476907

NP_726797

 

 

3) Systems biology is largely a study of patterns of regulation of gene expression. In a microarray experiment, the following set of Affymetrix probe-set IDs correspond to genes that are found to be significantly up-regulated in tumor vs. normal cells. Obtain 1000 bases of genomic sequence upstream of the first exon for all of these genes so that we can search for common patterns of transcription factor binding sites.

 

Where can you find a database that contains both Affy IDs and genome sequence? Look at the UCSC Genome Browser.  How can you make a batch query for Affy IDs in this tool? Study the Table Browser interface: http://genome.ucsc.edu/cgi-bin/hgTables

 

[One more hint: all of these Affy IDÕs come from chip type Affy U133 Plus2]

 

210340_s_at

210340_s_at

206148_at

207288_at

203624_at

203624_at

203624_at

206779_s_at

201029_s_at

201029_s_at

201028_s_at

201909_at

207246_at

233178_at

217049_x_at