Research Computing Newsletter


Research Computing News

Volume 5, Number 2
December 1996



Contents


New Software Available from the RCR


The RCR has expanded its arsenal of molecular biology software that can be run on Macintosh computers. In addition to renewing our license for GeneWorks, we have recently purchased NYU-MC site licenses for MacVector and for Sequencher. All three of these applications are available on floppy disks to install on your own Mac as well as on the public Quadra in room 174 MSB. Our licenses currently allow for 3 simultaneous users of GeneWorks, 3 users of MacVector, and 1 user of Sequencer. We can obtain additional licenses if demand for these applications warrants.

MacVector

MacVector is similar to GeneWorks in many ways. It offers a comprehensive menu of tools for the manipulation of single DNA and protein sequences including restriction mapping, PCR primer design, translation in various reading frames and with alternate codon tables, amino acid charge and hydrophobicity plots, etc. It also has an array of tools for the comparison of two sequences (such as dot plots and alignment) with a wide variety of display and printing options. MacVector also incorporates the ability to access GenBank directly over the internet. Both BLAST similarity searches and ENTREZ text-based searches of GenBank are possible using the power of online computers at NCBI. This is a unique new feature for a program that runs on a desktop computer.

MacVector is weaker in its tools for working with multiple sequences. It can compare one sequence with a series of other sequences, but it cannot do true multiple alignments, it has no phylogenetic analysis features, and no multiple sequence editor. There is a separate software package included with MacVector called AssemblyLIGN for DNA sequencing projects that allows for the creation of contigs from multiple sequence reads. The value of this tool has not been established, but it is clearly no match for Sequencher.

Sequencher

Sequencher is a Macintosh application that is dedicated to DNA sequencing projects. It is particularly valuable for the analysis of output from Applied Biosystems automated DNA sequencing machines. Individual DNA sequence "reads" from gel lanes on autoradiograms or text files generated by automated sequencers can be imported, vector sequence automatically removed, low quality sequence automatically chopped off and then overlapping fragments quickly and easily assembled into contigs. The contigs can then be viewed as a schematic or as a multiple sequence alignment. The alignment editor provides very nice tools for identifying and correcting mismatched bases.

If you use an ABI automated sequencer, then Sequencher has some amazing additional features. The actual "electropherogram" traces (that provide a graph of signal intensity for each base at each position in the sequence) can be imported directly from the ABI sequencer into the Sequencher program. Then once contigs are assembled and you view the multiple alignments, the traces are aligned along with the sequences. Wherever there is an ambiguous base, you can view the sequence traces that led to those base calls and determine the relative quality of each sequencing read at that position. All of this data can be stored in a compressed format that takes up about half of the disk space of the original files generated by the ABI software.

This is clearly a niche tool. If you do a lot of DNA sequencing and assemble short fragments into contigs, then it is very valuable. If you work with an ABI sequencer, then it is totally indispensable. Otherwise, you are probably better off sticking with GCG.


New RCR service: DBWatcher provides automatic database searches

The RCR is initiating a new service which will provide weekly automated similarity searches of user's query sequences against GenBank databases. This service, called DBWatcher, will be similar to the GBNews program. GBNews searches for keywords in the annotation of new sequences; DBWatcher tests for similarity between new database sequences and user-specified query sequences.

Users can subscribe to DBWatcher by submitting information to the RCR staff with a Web form: http://mcrcr0.med.nyu.edu/rcr/dbwat.htm

The searches will run automatically, and the results will be sent to you by e-mail.

DBWatcher uses the BLAST program at NCBI to perform sequence similarity searches against the "month" database which contains all new sequences submitted to GenBank over the past month or so (i.e. since the last cumulative update). By running our searches weekly, we gain the benefit of prompt discovery of new sequences without re-searching the same database too often. All of the options associated with a BLAST search (run via GCG, over the NCBI BLAST Web server, or by e-mail to NCBI) are available such as adjusting the number of hits to show, statistical cutoffs for significant matches etc.

Send questions about DBWatcher to Stuart Brown at browns02@mcrcr.med.nyu.edu



Reorganization and Upgrades

Computing Resources move to the Office of the Dean

At the beginning of the new academic year the Dean's Office and Dr. David Scotch assumed direct responsibility for computing shared resources comprising the Research Computing Resource (RCR) and Hippocrates Project, now known as the Education Computing Resource. Previously, the RCR had been part of Cell Biology and the Hippocrates Project was part of the Medical Library. Moving these shared resources to the Deans Office emphasises the interdepartmental, Medical Center wide committment of both shared resources, with the purpose of broadening the services and projects that can be tackled in support of educational and research goals.

The RCR Upgrades to the Alpha Server 4100 5/400

On Tuesday, 26th November we wheeled the 2100 away and "officially" switched over to the new AS 4100: the "new" MCRCR0. The 4100 is the latest in the series of DEC mid-range servers that they have shipped over the last two years. Our 2100 was the fastest machine in this category 18 months ago when we bought it, and the 4100 is the fastest today: of course it won't stay in that position for very long! The machine has a Gigabyte of main memory and four 400MHz Alpha-chip processors that are used in a SMP (symmetric multi-processing) environment. The new machine has 16Gb of disk attached on a fast wide SCSI bus. The 4100 is clustered with MCRCR6, the 3000/500 that runs the mail, news and ftp servers and has access to the 8Gb of disks on that machine also. Both machines are connected with 200mb FDDI links to the Medical Center's switched FDDI core network and therefore have superb access to computers everywhere, both inside and outside the Medical Center. The new 4100 is a very powerful augmentation to the resources that the RCR manages for the Medical Center. We are looking forward helping you get the most out of it!

- Ross Smith
PowerBook





RCR Web site update

The RCR's Web site has been growing very rapidly in the past few months. If you haven't been by lately, come check it out.

http://mcrcr0.med.nyu.edu/rcr/


  • The complete GCG Program Manual with the programs indexed both alphabetically and by topic. The topic indexes are also linked to short descriptions of all programs in that topic.

    http://mcrcr0.med.nyu.edu/rcr/gcg/


  • The program manual from the EGCG (Extended GCG) suite of programs is also available.

    http://mcrcr0.med.nyu.edu/rcr/egcg/


  • A constantly updated set of links to the best molecular biology web sites:

    http://mcrcr0.med.nyu.edu/rcr/molbiolink.html >


  • The course notes for 'Using Computers for Molecular Biology": These notes include both basics of the VMS/DCL operating system and some practical/theoretical essays about various aspects of Molecular Biology Computing. I need your feedback to make these notes more useful to both students and faculty.

    http://mcrcr0.med.nyu.edu/rcr/course-97.html

Better RCR e-mail security with APOP

We have configured the MCRCR POP mail server to work with APOP. This allows EUDORA or other mail client programs (e-mail programs that run on your desktop computer) to connect to MCRCR without transmitting your login password in the clear over the network - or over phone lines, if you ever access your MCRCR account with a modem. APOP provides an encrypted (MD5 hashed) version of your password that is transmitted between your computer and our e-mail server. However, be aware that APOP only encrypts your password, not the content of your e-mail messages. If you are sending confidential data by e-mail, you should consider using PGP or some other encryption software.

It is strongly recommend that all RCR users switch to the APOP password scheme. Users of Eudora and most other POP mail clients are very easy to switch from standard POP to APOP). Note that APOP support is quite new. It may not be an option for some clients/servers (such as ccMail and older versions of Eudora).

To use APOP you need to have a special password for reading mail (it could be the same as your regular login password, but we advise against it at this point). On the RCR's Alphas, enter: $ PMDF PASSWORD then enter the APOP password, following the instructions.

In Eudora Light 1.5.4 (and later) and Eudora Pro 3.0, select Settings from the Special menu and click on the Checking Mail icon from the list of options on the left side of the window. Then at the bottom of the window, where it says Authentication: choose APOP and click OK. I also check the Save password box so that Eudora remembers my password and it is not necessary to type it in every time I check my mail. Only use this option from a secure computer (i.e. one on your desk in a locked office), not from a computer in a public space.

Eudora Settings screenshot



Enterprise

GCG V9: The Next Generation



In early January we will have the latest version of the GCG package installed for general use. We were involved in the beta test of the package and some of you may have seen it already.

Benefits of the package include:

  • A new X-Windows based graphical interface called SeqLab. See a preview of SeqLab on the RCR's web page: http://mcrcr0.med.nyu.edu/rcr/seqlab.html
  • GCG will now read files in FASTA format. This should simplify the movement of sequence between different software packages and between internet severs and local applications.
  • FASTA v2.0 which includes the BLOSUM50 scoring matrix that allows more sensitive searches than the previous default matrix, PAM 250. FASTA will calculate a significance statistic for each match. The number of scores reported in the output will be based on the statistics calculated for each match. This will replace the step of arbitrarily specifying the number of scores FASTA should report in its output.
  • MAP will now be able to display enzyme names horizontally. With the new format you can see all the enzymes that cut at the same position more easily. The new format also lets you show the exact cut positions on the bottom strand. MAP also has a new output format that simply writes a table of all the cut positions on both strands.

GCG V9, particularly the X-Windows component of it, is a heavier user of the system. For this reason we expect to restrict its interactive use to MCRCR0, our new 4100, which has plenty of horse-power to operate it. People who have used GCG V8 on MCRCR6 will have already been disappointed with its performance on this smaller machine. You'll still be able to use MCRCR6 to start FASTA jobs (which all run on MCRCR0) and do small computations: interactive FASTA will be restricted to MCRCR0, however.



Join the RCR: Get more computing power for your money!

This year a $500 subscription to the RCR for a PI's group gets you: unlimited CPU usage 24 hours a day all year; all the disk space you need for your projects; software and databases including GCG, GenBank; the new DBWatcher service plus free access to GeneWorks, Sequencher and MacVector for the Macs in your lab! (Getting your own licenses for these three products would cost your lab over $4,000: that's eight years worth of RCR subscriptions!) Another big benefit of RCR membership is access to the Molecular Biology Computing Graduate Course, online course notes (and GCG manual pages) and the training, advice, and consultation services of Dr. Stuart Brown, the RCR's Bioinformatics expert.

Membership in the RCR is a flat fee for a PI's group: no additional charges for additional users (students, technicians, postdoctoral fellows, visiting scholars etc.).

To subscribe, read the instructions:

http://mcrcr0.med.nyu.edu/rcr/about-rcr.html#rcr-7