next contents next index

UNIX Filenames explained


*** First, remember that UNIX is case sensitive! So a file named Myjunk.txt is different than MyJunk.txt which is also different than myjunk.txt This is a big change for VMS users, so get used to it!

*** UNIX filenames only contain letters, numbers, and the _ (underscore) and . (dot) characters. All other characters should be avoided. The / (slash) character is especially important, since it is used to designate subdirectories. Also the ; character which is used in all VMS filenames can create difficulties in UNIX - avoid it like the plague!

*** Most UNIX filenames start with a lower case letter and end with a dot followed by one, two, or three letters. However, this is just a common convention and is not required.

*** It is also possible to have additional dots in the filename. (This was not allowed in VMS.)


*** The part of the name that follows the dot is often used to designate the type of file:
  • files that end in .txt are text files

  • files that end in .c are source code in the "C" language

  • files that end in .html are HTML files for the Web.
*** But this is just a convention and not a rule enforced by the operating system.

*** This is a good and sensible convention and one that you should follow.

*** It is also quite handy to use extensions to name related files for a single project, or types of files. I like to use .seq for DNA sequences and .pep for protein sequences. GCG programs tend to put their own extensions onto their output files - this is very handy - later you will know that files named .fasta are the output from FASTA searches.


*** UNIX does not allow two files to exist in the same directory with the same name. Whenever a situation occurs where a file is about to be created or copied into a directory where another file has that exact same name, the new file will overwrite (and delete) the older file. UNIX will generally alert you when this is about to happen, but it is easy to ignore the warning, or to use a program that ignores this warning for you.

***Most GCG programs have a default name for the output file. If you accept this default, then you will generally be overwriting some previous file that had the same name, even though the contents of those files may be very different. Always choose a new name for the output file and try to be as informative and specific as possible; you will be thankful later.


next contents next index

Using Computers for Molecular Biology
Stuart M. Brown, Ph.D, RCR, NYU Medical Center
Comments to: browns02@mcrcr.med.nyu.edu