I want to write a few lines about file headers. These are the block of comments that should appear at the start of every source file. And by 'file' I mean not only program source code of whatever programming language, but also command files (shell scripts to Unix experts) and data files. And there is no reason why similar comments should not appear somehow in files fed into a word processor to produce documents like user manuals.
File headers usually say little about the detailed operation of the code in the file. But they contain a wealth of administrative information about the code as a whole. In some situations, the presence of a good file header can save (or cost) many times the value of the software.
The purpose of a file header is to aid identification of the file, both inside and outside the group of people who wrote it. It mentions programming and non-programming issues related to the use of the file. It describes the history of the file, and the changes made to it. It is a glorified comment and, like all comments, it tries to give assistance when something has gone wrong.
There are no hard and fast rules as to what should be in a file header. The following are merely my personal views. If you think I have omitted something, write about it and send it to the editor. In the situation that caused you to come up with your suggestion, you are probably right. Telling everyone about the problem will enable us all to incorporate your ideas and so avoid the same pitfall in future. Similarly, if you have tried some of my suggestions and found them pointless, again, write in. You could well be managing your code better than I do.
I am going to list a number of items that I believe should appear in the file header. I give them in the order I think they should appear, though this is very much a matter of personal taste. The list is rather long. But perhaps only the first few items are relevant if you are writing on your own for your own personal, home, use.
My later suggestions are somewhat officious and are only really applicable if you are writing in a large, commercial, environment. They attempt to prevent situations which can cause severe grief to the management of the larger companies and the bankruptcy of the smaller ones. If you find you need them, you should find that your employer has already written a standard telling you exactly what you must say. If he hasn't, more fool your employer.
Let me describe what I think should be in a file header.
This first group of items I call the programmer's entries because they contain information of direct importance to him. They help the original programmer, and his colleagues, sort out what the file is and how to get it to work.
The first line of a file should contain the file name and a description of the file's purpose. If you have to move the file to another operating system with a different file name convention, this will let you sort out what has happened when files go missing, or are ported with mangled file names. It's also useful for providing an annotated listing of a directory's contents, or deciding what belongs where if you have to unravel a corrupt file directory.
As an aside, chose your file names with care. Not everyone is as generous as Unix in allowing 254 character names using any character other than '\0'. Upper case alphanumeric names should be all right, as should lengths of 8 or less. (I do know of one obsolete system that restricted you to 6 character file names!)
Underscore, '_', characters seem generally allowed in file names these days, but may not be significant (that is, they are ignored when file names are matched by the file open function). And the full stop, '.', may be a valid character in the file name, or a syntax mark separating the file name from a 3 character file extension.
An expanded description of the file's purpose should follow in the file header. This allows an expansion of the terse description present on the first line, but need only be a few lines long. (The full description is, of course, in the documentation... )
There should be an indication of the code's target operating system (and the host system if it is not the same). Similarly the compiler used to compile the code should be named. The version numbers of all of these should be given. It is a rare program that is not subtly, or inadvertently, compiler or operating system, dependent. And system entry points are not only added to new releases of operating systems, but old ones are occasionally removed as well.
Does the program require any special, or scarce, system resources? Does it need a lot of memory, take a long time to run, need access to a special type of peripheral, or require sole access to a disk which is usually made available to all the system's users? These sort of requirements would normally be listed in the program's User Manual, but a brief note of these restrictions in the file header is worthwhile.
A reference to the script that builds the program is also useful. If the script is short, I have been known to include a copy in the file header. It is less likely to be lost that way.
If you are using an integrated development system (pro-grammer's toolbench) where you click on a menu. item to generate an executable image, such a script may be hidden from you within the compiler. Just make sure you back up that configuration file when you archive the project. And just the same, write down the compiler options and library selections that you used when starting the project. Certain suppliers are noted more for their inability to maintain a consistent user interface than they are for their ability to release bug-free software. So you may need that information when you next try to build your program.
The presence of the change history of the file always indicates a better quality of workmanship. This will probably include a date, the new version number, the name of the person making the change and either a reference to the document authorising the change (you don't make unreviewed, unauthorised, changes to delivered software, do you?) or a one or two line description of what was changed and why. This can help narrow down the search for newly reported bugs in a large coding environment: first look at what has just been changed!
I come now to a group of file header entries that only really apply if you work for a company (though that company may be just yourself working out of your back bedroom).
So I call this section the management's entries, because it is they who will want this information in your file, and have laid down a precise format for the entire file header if they realise its importance.
If you are typing away on your own for your own private interest (much in the way that I am doing this) then much of the following may not apply. But there is no reason why you should not read what follows to see what is going on. Knowing what a good manager has to be (or should be) aware of makes for easier job changes and higher salaries.
There should be a drawing number for the file. This says where the correct (that doesn't necessarily mean bug-free!) copy of the file is held. That is, the file that should be exactly what went to the customer. Exactly what that drawing number looks like depends on how your employer and his Software Configuration Control System (SCCS) work.
The point is that if you have to change the file, your first step is to get a copy from the SCCS, and not use the copy someone found lying around in a departed colleague's file directory. This is the only way you know that you are working on the file used to build the delivered product, rather than a version which may be incomplete and potentially contains unknown, and untested, changes.
There should be complete contact information for the person (or company) that wrote the file. This might include address, telephone and fax numbers, and an E-mail address. A contact name is useful, though often out of date, and is probably best left as someone in marketing, or customer support, rather than the programmer who actually wrote the code.
This gives your customer the impression that you care for him and want to solve his problems. More to the point, it opens the door to selling him some more products. It also occasionally flushes out a software pirate, which is also valuable information.
If the file was written for a specific customer, similar details of that customer should follow. For one thing, the contract may say that he owns the software, and that you only wrote it. (Ask your manager about that one). But you can include your (and possibly his) contract reference numbers.
The inclusion of a security classification does not indicate paranoia, or a military project, but merely a commercial awareness of the project's worth. It is a subtle way of reminding everyone to look after your property.
Security means 'is the source code to be viewed only by those working on the project? or can anyone, including those on competing projects for other customers, see it? Can the code be released to the client? Does it contain company proprietary information that must not leave the site? Is the file intended for free distribution to all and sundry via the internet? Or is it your CV, which must never be seen by anyone, least of all That Team Leader. (And if it is your CV, wouldn't it be safer writing it on your own home computer?)'
Security also means protecting your ownership of the program, and that means (perhaps) copyright. Now I've avoided saying 'copyright' up to now because this has legal connotations. I'm not a lawyer, so I must tell you to seek out your manager and ask him what he thinks of the following. And ask him to give you a memorandum with a precise form of words to be placed in the file header. And make sure you use them.
You need a statement over who owns the file. This is what a copyright statement does, but at the expense of saying that the document has been published for all to read, and that copyright law is to be used to sort out unauthorised use. Your manager might prefer to release the file only under a non-disclosure agreement that has different consequences. Or he may choose some other route. It is his decision.
A statement of the extent of the intended applications of the file might be appropriate. This impresses your customer that you have considered what your program might be used for (and I hope you checked that it will do it!). And it lets you tell your other customers that the program was not designed to do what they are about to do with it, so they must suffer the consequences of what it does do. If you think this is back covering, I would agree, and I despair that it is necessary at times.
Finally, I come to something that use to be de rigeur, but, thankfully, is now usually omitted. This is to list a complete example of the character set used. Its necessity goes back to the days when not only could the computer manufacturers not agree what the character set was, but also they couldn't agree how many bits there were in a byte. And just to confuse matters, most manufacturers supported at least two contradictory character sets, neither of which included everything in what is now known as the ASCII 128 character set.
Whilst the characters used by the FORTRAN-66 standard would usually be translated between character sets unscathed, what happened to anything else was pot luck, and possibly dependent on the precise order in which the character set conversion utilities were run.
The only protection the hapless programmer had was to list the character set he used at the beginning of his file, as a column containing an example of the characters together with a column giving the character's name. Starting with a listing of the file header from the host computer, he would then try to sort out what had happened during the file transfer to the target machine. Then he had to fix it before attempting to build his code.
Fortunately, such happy days departed us with the acceptance of the 8 bit byte and the ASCII character set, now codified by ISO-646 and ISO-8859-1. But we are now entering the era of wide character sets and, despite the guidance of ISO-10646, I wonder if such halcyon days will return.
Perhaps they already have as I recently encountered the problem that the hard type faces and point sizes present on the development machine had been assumed to be present on the target machine. They weren't.
If you've stayed with me to here, I think you've gone through some fairly heavy material at times. I hope I've justified why the various file header entries should be present. But it is attention to this sort of detail that separates the jobbing coder from the master programmer, or the unemployable hacker from the competent, sort after, highly paid, professional.
I haven't given an example of a file header because there is too much tailoring required: there is too much that might (or might not) be wanted. If like what I have said, and are starting from scratch, just type in everything you think applicable with a couple of lines space between each item. As time goes by you may want to change the layout, but that is all I do.
I can't claim that this list is complete, only that each item is based on somebody's bitter experience, somewhere.
So what have I omitted? What caused you to have to include your suggestion in your work? Where have I gone over the top? What of the above would you not include in a file header? Over to you.