assign3.htmlassign3.html
UNIX Assignment #2: Editing text with emacs
This is Assignment should take 1-2 hours of steady work. Use the lecture notes Web pages as a reference.
1. Login to your account on genetraffic.med.nyu.edu, move to your sub-directory for classwork and get a directory listing. You should have some sequence files from last week.
***Here is something new, set up your account on the "genetraffic" server to access databases over the internet*** Copy this file to a text file in your home directory and name it .embossrc2. Now get a sequence file from GenBank:
seqret GENBANK:CAG46711Now open this sequence file in emacs: > emacs cag46711.fasta
3. Try out the navigation commands, use the arrow keys to move one character or one line at a time;
Ctrl-V to page down and [Esc] V to page up; [Esc] > and [Esc] < to jump to the end and the beginning of the file.
4. Now move to a line somewhere in the annotation and start typing. Then go back and delete what you have typed.
5. Now try the copy and paste commands to move a block of text.
Move your cursor to the beginning of the text that you want to move and set a "Mark" with
[Ctrl-spacebar]
Now move the cursor to the end of the block of text and "Cut" it to the clipboard with [Ctrl-W]
Move to the place where you want to put the text and "Paste" it in with [Ctrl-Y]
6. Save and exit from emacs: [Ctrl-X] [Ctrl-C] and answer "yes" when it asks if you want to save
We will get back to emacs at the end of this lesson.
GCTCGAATA CCCTCCTAAA GGGACTAGTC CTGCAGGGTT TAAACGAATT CGCCCTTAAG CAGCGGTATC AACGCAGAGn ACTTTATTTT TTTTTTTTTT TTTnATTGAG TTTCACTTnT CGnGGnGATA AATTTTGnAA AATATACGTT ATAAATTATA TTAGGGGGTT GCCTTTAATT GAATATCATT GCATTTTTAT TATTTATTTT ATGAAGTACT ATTTAAAATA AATAAAAATA nACCTTAGTT ATTTATATAA AGnTTAATTA AATACnAGTA AATCCnTACT TCATTTTGnG GGGATTAATT CTGnATTTTA TCTTACCTTA TTCTTTATAT ACTTTATTTT TATTCTATTG CATTTTAGCT TTTAAATCAA ATTTTATGAT TTTTTATTnG TATTTTTTnT ATATTGnTAT TGnCnTTATT ACCAATCATn GATAGCTTG
Use the EMBOSS seqret command to turn the text into a usable sequence file.
15. Now we try a very simple program. First copy this file of DNA sequences from the browser into a new file in your Classwork directory (use emacs again).
testseq.fasThese are some ESTs from a cDNA sequencing project; and we would like to check for redundancy - how many transcripts we might have from the same gene.
16. Unpack the sequences into individual Fasta formatted files using the EMBOSS command
seqretsplit testseq.fas
17. Now we will create a simple loop program to check each sequence against the other ones. Use the shell command foreach like this:
for i in *.fasta; do water -asequence=$i -bsequence=testseq.fas -auto; done
This means to run the EMBOSS program "water" (more about this next week) using a local dataset of all sequences in the current directory (*.fasta), and to set all other parameters to the default values (and not to ask for any other input).
19. The shell will ask again foreach?
You type:
endThis means to execute the loop for as many times as there are sequences that match the expression "*.seq"
20. When the loop is done (about 30 seconds), you will have a set of new files in your directory, these are the results of the searches. Look at some of the result files. If there is more than one match, what does that mean about in input sequences? Are there any matches that you do not consider correct or significant? We will study this more in a couple of weeks. If you had hundreds of input sequences, it would be nice to automatically sort them into groups based on the matches. A Perl program would be a good way to do that.
This is a trivial example, but you could easily imagine adding more steps to the script that process the results, sort them into groups, etc. You can also save the script as a text file and execute it whenever you want - or even as part of another script.
21. OK, now back to emacs. Launch the emacs program, then type [Cntl-h] t
Run through this tutorial on your own - you will not finish in one sitting.