next next next index

Lecture 7: Using the Internet for Molecular Biology Research

A. What is the Internet?

*** Over the past decade, the single biggest change in the way molecular biologists conduct research must be attributed to the Internet.

*** it is now routine for every newly determined sequence to be immediately BLAST searched against the latest version of the GenBank database

*** researchers on opposite sides of the globe are in daily contact by e-mail

*** abstracts of journal articles are freely available over the Internet before the journal issues are even printed.

*** The Internet is not one thing. People talk about "connecting to the Internet", but this is an oversimplification.

*** The Internet is actually a large number of different computer networks that are all linked together - often using the electronic equivalents of chewing gum and wire coat hangars.

*** Each of these constituent networks is owned by different organizations ( i.e. America Online or New York University), governments (i.e. Brazil), corporations (i.e. MCI), with their own private agendas, but all agree that the open exchange of information is in everyone's best interests.

*** Since no one "owns" the Internet, it is very difficult to force any kind of policy change on it. Each sub-network is free to unilaterally implement policy changes that effect its own users but they cannot control users on other sub-networks.




B. "The network is the computer"
(Scott McNealy, Sun Microsystems )

*** This quote sums up the future of molecular biology computing.

*** As sequence databases grow ever larger (at an accelerating rate), individuals and research institution will be less able to afford the computer power to perform the rapid searches that biologists require.

*** The only solution will be greater dependence on Internet access to centralized computing facilities.

*** Today the NCBI (National Center for Biotechnology Information at the National Library of Medicine, Washington, DC) provides free public BLAST similarity searches of GenBank and ENTREZ searches of sequence annotation.

*** These services can only grow in popularity as more and more small and mid-sized universities and research institutes abandon the effort to maintain local copies of the enormous databases.

*** Until recently there was a clear distinction between applications that ran entirely on the desktop machine (word processing, spreadsheets,etc.) and applications that relied on the network (e-mail, WWW browsers, USENET News readers).

*** However many traditional "stand-alone" programs have added network functions such as a scheduling program that compares your personal schedule over the network to a master schedule for your research group, or a database program that allows other users to remotely access (or even modify) your data.

*** The next step in this process in not entirely clear, but many computer manufacturers are suggesting that the next generation of desktop computers may be simpler machines called "Network Computers" (NC's).

*** All applications and all data will reside on network servers, and the NCs will simply provide access to these applications.

*** These NC's might even lack an operating system, bringing us full circle back to the age of mainframes and dumb terminals.

*** NC's will be much cheaper than high powered personal computers, will require much less maintenance, and will not need to be frequently upgraded.

*** Each individual user's data (and favorite applications and personalized settings) will reside in a protected (and frequently backed up) "virtual" space on the servers rather than on one fragile desktop machine.

*** No more data loss due to hard disk crashes, and access to your data from any computer in the world with an Internet connection or modem.

*** Software upgrades are installed centrally, and users will have access to as much power as they need for any computation task.

*** With Network Computers, the distinction between your computer, your local network, and the Internet will all but disappear. Instant access to data anywhere in the world will be assumed whenever you sit down at a computer.



C. E-mail: The most basic Internet function is still the most important

*** The modern Internet grew out of a system designed to carry e-mail between government and university defense researchers.

*** E-mail is still the lowest common denominator of the Internet.

*** Even the most technically backward and internally closed network systems provide gateways for the transfer of e-mail messages to the rest of the world.

*** Even extremely Internet savvy scientists still consider their most important Internet function to be e-mail.

*** There is simply no substitute for the speed and quality of information that you can get from an e-mail message to a colleague or renowned authority.


*** In addition to personal mail messages, there are many other Internet resources that can be accessed with only an e-mail connection. For example, NCBI provides, BLAST searches, text-based ENTREZ database queries, and a sequence retrieval and service via e-mail.
Retrieve E-Mail Server: retrieve@ncbi.nlm.nih.gov 



Send a message with the word HELP as the text of the message to receive the

instructions for using the Retrieve server, which performs searches of GenBank

and other databases (listed in the help documentation). The RETRIEVE server

allows you to obtain sequences by accession number, locus, author, keyword, etc.

 



To receive instructions for obtaining dbEST or dbSTS records, send a message

with the following in the body of the message:



        datalib dbest [or dbsts]

        help



Query E-Mail Server: query@ncbi.nlm.nih.gov 



Send a message with the word HELP as the text of the message to receive the

instructions for using the Query server, which uses the Entrez search engine to

provide integrated access to nucleotide, protein, structure databases, as well

as a molecular biology subset of MEDLINE. This server enables you to retrieve a

record(s) of interest from a target domain, related records ("neighbors") from

the same domain, or associated records ("links") from other domains. Searches

are done by accession number, author, keyword, and other text terms. 



    BLAST E-Mail Server: blast@ncbi.nlm.nih.gov 



Send a message with the word HELP as the text of the message to receive the

instructions for using the BLAST server, which performs sequence similarity

searches against GenBank and other databases. 



D. Mailing lists

*** Another interesting use of e-mail are mailing lists (also know as "listservs").

*** These lists are open ended discussions devoted to a huge range of different topics that range from sports teams to TV shows to your favorite computer programming application development environment.

*** The lists are hosted on a single computer (the mailserver). Everyone involved with the list sends mail to a single address and the host computer then forwards copies of every message out to everyone who wishes to subscribe to the list.

*** For popular lists, this can mean receiving dozens of e-mail messages every day - and hundreds if you have been away from your computer for a week or more.

*** Some mailservers also offer a daily or weekly digest of all messages so that you can get one giant message rather than many individual ones.




E. Internet Newsgroups

*** Newsgroups are open discussion forums similar to mailing lists, but rather than getting all of the messages from all discussion participants in your e-mail every day, the messages are held on a separate server.

*** Most Internet access accounts (including all NYU accounts) provide assess to a News server in addition to an e-mail server.

*** There are over 3000 different discussion groups (known as USENET Newsgroups) accessible to most Internet users throughout the world and each sub-network has some of its own local groups as well.

*** At NYU Medical Center, you have access to 151 NYU local groups including 6 NYU Med groups - notable among these is news:nyu.med.rcr which features important information about the Alpha server and other news from the Research Computing Resource.

*** In order to read USENET Newsgroups, you need a newsreader program.

*** The VMS operating system on the RCR's Alpha server provides a bare bones newsreader called (imaginatively enough) "NEWS".

*** There is also a newsreader built into Netscape Navigator (and most other web browsers)

*** My favorite newsreader is a Macintosh program called NewsWatcher by Peter Lewis.

Get it by FTP from mcrcr.med.nyu.edu and on the RCR's AppleShare server (in the Basic Sciences Zone) in the Apple Utilities volume in the News Things folder.
*** Molecular biologists should pay special attention to the set of bionet newsgroups with discussion groups that include   bionet.molbio.hiv ,   bionet.genome.chromosomes ,   bionet.molbio.methds-reagents , and my favorite:   bionet.software.gcg .



F. Gopher

*** Gopher is a form of distributed database that allows users to read text files on many computers around the world over the Internet.

*** Gopher was the predecessor of the World Wide Web, but it was limited to plain text documents (without any formatting) and it was not easy for ordinary people to make their own information publicly available via Gopher.

*** It was used primary by universities and government organizations to make institutional documents available over the Internet.

*** The World Wide Web is actually a superset of Gopher since existing Gopher documents can be accessed with a Web browser.

*** There are no significant advantages to continued use of Gopher today. Most Gopher documents are no longer updated and many Gopher servers are being discontinued.




next next next index


Using Computers for Molecular Biology
Stuart M. Brown, Ph.D., RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu