UNIX Assignment #2: Editing text with emacs
This is Assignment should take 1-2 hours of steady work. Use the lecture notes Web pages as a reference.
1. Login to your account on Mendel.med.nyu.edu, move to your sub-directory for classwork and get a directory listing. You should have some sequence files from last week.
2. First, get a sequence file:
fetch hsp70.msfNow open this sequence file in emacs: > emacs hsp70.msf
3. Try out the navigation commands, use the arrow keys to move one character or one line at a time;
Ctrl-V to page down and [Esc] V to page up; [Esc] > and [Esc] < to jump to the end and the beginning of the file.
4. Now move to a line somewhere in the annotation and start typing. Then go back and delete what you have typed.
5. Now try the copy and paste commands to move a block of text.
Move your cursor to the beginning of the text that you want to move and set a "Mark" with
[Ctrl-spacebar]
Now move the cursor to the end of the block of text and "Cut" it to the clipboard with [Ctrl-W]
Move to the place where you want to put the text and "Paste" it in with [Ctrl-Y]
6. Save and exit from emacs: [Ctrl-X] [Ctrl-C] and answer "yes" when it asks if you want to save
We will get back to emacs at the end of this lesson.
GCTCGAATA CCCTCCTAAA GGGACTAGTC CTGCAGGGTT TAAACGAATT CGCCCTTAAG CAGCGGTATC AACGCAGAGn ACTTTATTTT TTTTTTTTTT TTTnATTGAG TTTCACTTnT CGnGGnGATA AATTTTGnAA AATATACGTT ATAAATTATA TTAGGGGGTT GCCTTTAATT GAATATCATT GCATTTTTAT TATTTATTTT ATGAAGTACT ATTTAAAATA AATAAAAATA nACCTTAGTT ATTTATATAA AGnTTAATTA AATACnAGTA AATCCnTACT TCATTTTGnG GGGATTAATT CTGnATTTTA TCTTACCTTA TTCTTTATAT ACTTTATTTT TATTCTATTG CATTTTAGCT TTTAAATCAA ATTTTATGAT TTTTTATTnG TATTTTTTnT ATATTGnTAT TGnCnTTATT ACCAATCATn GATAGCTTG
Use the GCG reformat command to turn the text into a usable sequence file.
15. Now we try a very simple program. First copy this file of DNA sequences from the browser into a new file in your Classwork directory (use emacs again).
testseq.fasThese are some ESTs from a cDNA sequencing project; and we would like to check for redundancy - how many transcripts we might have from the same gene.
16. Unpack the sequences into individual GCG formatted files using the GCG command
fromfasta testseq.fas
17. Now we will create a simple loop program to check each sequence against the other ones. Use the shell command foreach like this:
This tells the shell to run a loop with the variable i set to contain each of the files in the current directory that match the wildcard expression *.seq
18. The shell will now give you a new prompt that looks like this:
foreach?
You type
: fasta $i -in2=*.seq -def
This means to run the GCG program "Fasta" (more about this next week) using a local dataset of all sequences in the current directory (*.seq), and to set all other parameters to the default values (and not to ask for any other input).
19. The shell will ask again foreach?
You type:
endThis means to execute the loop for as many times as there are sequences that match the expression "*.seq"
20. When the loop is done (about 30 seconds), you will have a set of new files in your directory that end in .fasta, these are the results of the Fasta searches. Look at some of the *.fasta files. If there is more than one match, what does that mean about in input sequences? Are there any matches that you do not consider correct or significant? We will study this more in a couple of weeks. If you had hundreds of input sequences, it would be nice to automatically sort them into groups based on the Fasta matches. A Perl program would be a good way to do that.
This is a trivial example, but you could easily imagine adding more steps to the script that process the results, sort them into groups, etc. You can also save the script as a text file and execute it whenever you want - or even as part of another script.
21. OK, now back to emacs. Launch the emacs program, then type [Cntl-h] t
Run through this tutorial on your own - you will not finish in one sitting.