UNIX Assignment #2: Editing text with emacs

This is Assignment should take 1-2 hours of steady work. Use the lecture notes Web pages as a reference.

1. Login to your account on Mendel.med.nyu.edu, move to your sub-directory for classwork and get a directory listing. You should have some sequence files from last week.

2. First, get a sequence file:

 fetch hsp70.msf 

Now open this sequence file in emacs:
> emacs hsp70.msf

3. Try out the navigation commands, use the arrow keys to move one character or one line at a time;
Ctrl-V
to page down and [Esc] V to page up; [Esc] > and [Esc] < to jump to the end and the beginning of the file.

4. Now move to a line somewhere in the annotation and start typing. Then go back and delete what you have typed.

5. Now try the copy and paste commands to move a block of text.

Move your cursor to the beginning of the text that you want to move and set a "Mark" with
[Ctrl-spacebar]

Now move the cursor to the end of the block of text and "Cut" it to the clipboard with [Ctrl-W]

Move to the place where you want to put the text and "Paste" it in with [Ctrl-Y]

6. Save and exit from emacs: [Ctrl-X] [Ctrl-C] and answer "yes" when it asks if you want to save
We will get back to emacs at the end of this lesson.

Copy this sequence into a text file (use emacs)
GCTCGAATA CCCTCCTAAA GGGACTAGTC CTGCAGGGTT TAAACGAATT 
CGCCCTTAAG CAGCGGTATC AACGCAGAGn ACTTTATTTT TTTTTTTTTT 
TTTnATTGAG TTTCACTTnT CGnGGnGATA AATTTTGnAA AATATACGTT 
ATAAATTATA TTAGGGGGTT GCCTTTAATT GAATATCATT GCATTTTTAT 
TATTTATTTT ATGAAGTACT ATTTAAAATA AATAAAAATA nACCTTAGTT 
ATTTATATAA AGnTTAATTA AATACnAGTA AATCCnTACT TCATTTTGnG 
GGGATTAATT CTGnATTTTA TCTTACCTTA TTCTTTATAT ACTTTATTTT 
TATTCTATTG CATTTTAGCT TTTAAATCAA ATTTTATGAT TTTTTATTnG 
TATTTTTTnT ATATTGnTAT TGnCnTTATT ACCAATCATn GATAGCTTG 

Use the GCG reformat command to turn the text into a usable sequence file.

––––––––––––––––––––––––––

15. Now we try a very simple program. First copy this file of DNA sequences from the browser into a new file in your Classwork directory (use emacs again).

testseq.fas

These are some ESTs from a cDNA sequencing project; and we would like to check for redundancy - how many transcripts we might have from the same gene.

16. Unpack the sequences into individual GCG formatted files using the GCG command

fromfasta testseq.fas

17. Now we will create a simple loop program to check each sequence against the other ones. Use the shell command foreach like this:

This tells the shell to run a loop with the variable i set to contain each of the files in the current directory that match the wildcard expression *.seq

18. The shell will now give you a new prompt that looks like this:

foreach?

You type: fasta $i -in2=*.seq -def

This means to run the GCG program "Fasta" (more about this next week) using a local dataset of all sequences in the current directory (*.seq), and to set all other parameters to the default values (and not to ask for any other input).

19. The shell will ask again foreach?

You type: end

This means to execute the loop for as many times as there are sequences that match the expression "*.seq"

20. When the loop is done (about 30 seconds), you will have a set of new files in your directory that end in .fasta, these are the results of the Fasta searches. Look at some of the *.fasta files. If there is more than one match, what does that mean about in input sequences? Are there any matches that you do not consider correct or significant? We will study this more in a couple of weeks. If you had hundreds of input sequences, it would be nice to automatically sort them into groups based on the Fasta matches. A Perl program would be a good way to do that.

This is a trivial example, but you could easily imagine adding more steps to the script that process the results, sort them into groups, etc. You can also save the script as a text file and execute it whenever you want - or even as part of another script.

––––––––––––––––––––––––––

21. OK, now back to emacs. Launch the emacs program, then type [Cntl-h] t

Run through this tutorial on your own - you will not finish in one sitting.