Research Computing Newsletter


Research Computing News

Volume 8, Number 1
August 2000



Contents


Many Changes at the RCR: Ranger, UNIX, SeqWeb, Oh my!

As you are probably aware, the RCR has undergone a rather strenuous series of transformations over the past several months. We have attempted to minimize the disruptions to your work, but some changes were unavoidable. Here is a very quick overview of changes that are noticeable to our users:

  1. The RCR is out of the e-mail business. We have transferred all e-mail accounts from the old @mcrcr0.med.nyu.edu addresses to the main medical school e-mail system @popmail.med.nyu.edu. All of our users have had to deal with the transfer of their old mail messages, changing e-mail settings on their personal computers, and creating a new password for the popmail account. One clear benefit of this transition is that everyone can now use the nice simple @med.nyu.edu as their return address.

  2. We have made the big transition from VMS to UNIX as our operating system for the main GCG server. This was absolutely necessary since GCG no longer supports VMS. We provided a number of training classes in UNIX and the course notes remain available online at:
    http://www.med.nyu.edu/rcr/rcr/nyu_vms
    and
    http://www.med.nyu.edu/rcr/rcr/course/unix-contents.html
  3. The RCR's website has moved onto the main webserver for the medical school:
    http://www.med.nyu.edu/rcr
    Please change your bookmarks!

  4. We have added a web interface to GCG known as SeqWebª. While not as powerful as the command line control of GCG that is taught in the RCRÕs annual bioinformatics course or the SeqLab X-Windows interface, it is very easy to use. Everyone with an e-mail account on the popmail server should be able to use the same username and password to access SeqWeb, otherwise contact Tirza Doniger at 3-7135.

    Check it out at: http://mcrcr0.med.nyu.edu/gcg-bin/seqweb.cgi

    The Gory Details

    From an administrative standpoint there has been a fundamental change in the management of computing at the NYU Med. School. Previously, the main computers of Academic Computing were managed by systems managers from the RCR and from Educational Computing. Now, with the advent of the CIO's Office, these computers are managed centrally by a dedicated "systems group". This group also manages all of the schoolÕs central IT functions, such as e-mail and web services. Therefore, the RCR can now focus on the applications needed to make your research run smoothly.

    Several months ago, we announced that the VMS operating system on the RCR's main server, mcrcr0 (fondly known as "The Alpha"), would be replaced with UNIX to allow us to keep up with the state-of-the-art in molecular biology software. The first step, was to move nearly 800 e-mail accounts from mcrcr0 to popmail, while maintaining the users' stored mail files. Tirza Doniger and Jeff Berliner managed this feat over a 3 month period of extraordinarily hard work.

    In parallel, we obtained Dean John Deeley's support to purchase a new DS-20 computer as an e-mail server for the school to replace the ailing popmail machine. As soon as this machine arrived, it was set up as a temporary server for the RCR running a UNIX version of GCG. All RCR user accounts were then moved from mcrcr0 to the new machine which was named "ranger".

    Mcrcr0 was then reconfigured with new faster hard drives and the UNIX operating system. Then "ranger" was shut down and all programs and data were transferred to the rejuvenated mcrcr0, which then took over the name "ranger".

    The new DS-20 machine was then relieved of its role as a temporary GCG server and set up as the new "popmail" e-mail server, the purpose for which it had been purchased. In addition a new e-mail "gateway" machine was installed that provides e-mail routing (sending mail with @med.nyu.edu addresses to the correct mail server within the Med School) and anti-virus filtering for all incoming e-mail messages. This was all done in the nick of time since the old popmail machine was on the verge of melt-down due to the increased loads that it had to manage. The new "popmail" machine now handles the load easily.

    We think we did a pretty reasonable job, but we know of some remaining problems and weÕre keen to get them fixed. Please make sure all of your stuff works. Let us know early about any problems so we can fix them promptly.



    Weekly BLAST Searches with DBWatcher

    The RCR used to run a service called DBWatcher that allowed each RCR member to set up a group of sequences to be automatically BLAST searched each week against all of the new sequences recently added to GenBank. This service was not widely used and was poorly maintained because it required a lot of effort both from the users and from the RCR staff.

    We have re-installed a UNIX version of the DBWatcher program on Ranger (the UNIX reincarnation of mcrcr0), but we have also built a web interface (thanks to Tirza Doniger) that enables users to very easily set up and modify their own sets of query sequences. Check it out at:

    http://mcrcr0.med.nyu.edu/dbwatcher.html

    You may submit as many sequences as you wish to be searched by DBWatcher, but each sequence must be set up as a separate job. Just paste your sequence into the sequence field on the web page, indicate whether it is a DNA or protein sequence, and whether you wish to search against the DNA or protein database at NCBI (protein searches are generally more sensitive, even with automatic translation for DNA sequences, provided that your sequence is in fact protein coding). You can also choose a sensitivity cutoff Ð the smaller the E-value, the less sensitive your search will be to find distantly related sequences, but the fewer false positives will be found.

    DBWatcher runs searches of your sequences against the GenBank "month" database each week on Sunday night at midnight and sends the results to you by e-mail. This is a handy service if you want to keep track of all new sequences that are added to GenBank that are similar to genes of interest in your research.

    You will receive two e-mails from DBWatcher each week for each Job that you create (each sequence that you submit). The first e-mail contains a list of the sequences in the NCBI ÒmonthÓ database that match your query sequence and the scores for each match. The second e-mail contains the actual alignments of your query sequence and the matching database sequences. In order to make sense of these alignments, you must view them in a mono-spaced font such as Courier or Monaco. You can either change the font used by your e-mail program, or copy and paste the entire text of the e-mail into a text editor program (such as SimpleText on the Mac).





    Network 2000

    As everyone is certainly aware, the network infrastructure in the School of Medicine is outdated, overloaded, and has been in serious need of overhaul for several years. Every department has suffered through slow connections, inability to add network connections for new computers, and unscheduled downtime.

    To remedy this, the school has developed (and is currently implementing!) a new strategy for network infrastructure. The ancient shared 10base-T core network that was installed in 1985-9 will be replaced with a switched Gigabit (1000mb) core running over new fiber-optic cables. The benefit of switched vs. shared networking is that in the switched case each packet moves independent of those from other machines, hugely improving performance and greatly enhancing security. We are particularly fortunate at this time in that cutting-edge Gigabit ethernet technology has, just in the last few months, become an affordable standard for core networking.

    The first place that will benefit from this new infrastructure will be MSB where the network renovation is already underway. Before the year is over, however, we expect to fold the Library into this new infrastructure; completing the initial phase of the re-build of the school's information resources. The renovation of the Medical School's network is being integrated with plans for network renovations the HSO, and with a new network being built now in the VA and soon, we hope, in Bellevue as well.




    RCR Fees & Services

    Due to the disruptions over the past 6 months as the RCR has transformed itself from VMS to UNIX, added professional systems management, and transferred e-mail accounts to other University services; we have deferred our annual fees from the usual January billing date until September of 2000. Prior to sending out bills, we wish to clarify what bioinformatics services are provided for this fee.

    The RCR annual fee is $500 per laboratory group. A laboratory group is defined as a group of students, fellows, technicians, and visiting scientists working under the supervision of a NYU faculty member as Priciple Investigator (PI). There is no limit to the number of people who can belong to a group, but no full-time faculty can be a member of another PIÕs group. Each person in a group is a full RCR member with their own account and their own password. For this fee, the RCR provides the following services:

    1. Unlimited access to molecular biology data analysis software, such as GCG (including telnet, X-Windows, and the SeqWeb web interface), PHYLIP, CLUSTAL, and assorted other UNIX programs.
    2. Unlimited storage of sequence data on the RCRÕs mainframe machine where it will be protected by daily backups.
    3. Site licenses to molecular biology programs for desktop PCs including: MacVector, OMIGA (for Windows), and Sequencher (for both Mac and Windows).
    4. Training in the use of all of these programs
      • The RCR staff teaches an annual course through the Sackler Institute called "Using Computers for Molecular Biology" which is open to all NYU faculty, staff, and students.
      • The RCR website offers extensive course notes and tutorials on the use of all of these programs plus web-based bioinformatics tools.

    5. Technical support (for molecular biology software, not desktop computers!) by e-mail, telephone, and drop-in at the RCR offices in 174/183 MSB.
    6. Bioinformatics consultation. Stuart Brown provides consultation to all RCR members on bioinformatics issues including strategic advice on research projects as well as bioinformatics review of grant proposals and papers in preparation.

    For additional fees (established on a case by case basis), the RCR can provide custom programing and participate in collaborative research projects, up to and including co-PI status on grant proposals.

    - Stuart Brown




    Genomics & Microarrays

    Microarray technology is becoming a common tool used across a broad range of disciplines, particularly in the basic and clinical biomedical sciences. The technology is fundamentally quite simple: a grid of DNA probes is created with sequences which represent various genes in each grid cell. An RNA sample extracted from a tissue of interest is labeled and applied to the grid under stringent hybridization conditions, then the amount of RNA bound to each cell in the grid is measured in a quantitative fashion. This measurement is equivalent to the level of expression of the gene represented by that cell. The primary advantage of microarray technology is that it can provide a profile of the expression levels of very large numbers of genes in a single experiment.

    There are several different types of microarray technologies currently in use. One type utilizes as probes cDNA sequences bound to either a nylon membrane or a coated glass slide (spotted cDNA arrays). Spotted cDNA arrays are capable of measuring the expression of several hundred to a few thousand genes at a time. Experiments with spotted cDNA arrays generally make use of two RNA samples that have been labeled with two differently colored fluorescent tags. These two labeled samples are mixed together and hybridized to the array, then expression values are measured as the relative levels of the two colors.

    Investigators generally build their own spotted cDNA arrays from collections of genes that are of interest to them Ð and this is both the strength and the weakness of this approach. It is a significant laboratory management problem to maintain collections of thousands of cDNA clones, verify their sequence, amplify and quantitate the DNA, insure consistent DNA purity, and apply spots to an array in a reproducible fashion. On the other hand, once the cDNAs are collected, the chips can be produced in quantity at a low price, allowing investigators to explore gene expression in many experimental variations. Within the NYU School of Medicine, Jin Po Li in the Dept. of Microbiology (263-7661) has purchased a machine for the creation and analysis of spotted cDNA microarrays. It is his intention to make this machine available to other NYU scientists as a shared resource.

    An alternate microarray technology developed by the Affymetrix Corporation, utilizes as probes oligonucleotides that are synthesized directly onto silicon chips using photolithographic techniques. The Affymetrix GeneChipª technology has the advantage of allowing the expression levels of many more genes to be measured on a single chip. Affymetrix GeneChips measure up to 8,000 genes on each chip. A set of 5 chips is available that measure approximately 40,000 unique human genes. Gene Chips with even more genes are currently under development. Another advantage of the Affymetrix system is that the chips are manufactured under strict quality control, so signals from identical RNA samples are highly reproducible. In addition, the GeneChips contain internal controls that make it possible to correct for non-specific hybridization. The downside of GeneChip technology is that it is extremely expensive. The Affymetrix GeneChip reader costs approximately $150,000 and each chip costs $300 or more. NYU has recently entered into a research consortium (AMDEC) of NY area academic institutions to provide shared access to Affymetrix GeneChip machines and to purchase the chips at a discounted prices. In NY City, the Albert Einstein College of Medicine has agreed to provide access to their GeneChip reader. Contact George Grills, Director of the AECoM DNA Sequencing Facility, (718) 430-2657 for more information.

    All types of microarray technologies require substantial computational analysis of the experimental results. Software is available that can be used to determine either absolute expression levels for each gene in an array (GeneChips), or relative expression levels between the genes in two samples (cDNA arrays). Software is under development to analyze the gene expression levels determined in multiple microarray experiments in aggregate (i.e. timecourse experiments) in order to identify clusters of co-regulated genes. The RCR is building a Computational Genomics Center to aid NYU researchers in the analysis of microarray data. At the present time, we have installed the Affymetrix GeneChip Analysis Suiteª on a computer in our public computing room in 174-MSB. Researchers who have performed GeneChip experiments are encouraged to bring their data to our workstation for analysis and collaboration with Stuart Brown. This software allows analysis of single GeneChip experiments - extracting normalized expression values for each gene in an array, or pairwise comparisons of experiments in a control vs. experimental scenario.

    For users of spotted cDNA arrays (and nylon filters) we have also installed Michael EisenÕs ScanAlyze, Cluster, and TreeView programs. This provides basic analysis of microarray experiments including extracting relative expression values for each spot (each gene) and clustering. As demand for this service increases and our users initiate more complex experiments, we anticipate expanding the RCR's microarray analysis facilities. The features of such a center will include secure central storage of data, a relational database for all microarray experiments, advanced data visualization tools, training in the use of the software, consulting with bioinformatics specialists, and custom software development.