In this first chapter we'll tackle a problem that is so simple that a program to solve it is distributed with CP/M as an example of how to use the system. That program, DUMP, isn't very attractive, but it fills a need. Here and in the next chapter we'll try to improve on it. We'll also spend some time studying the programming conventions used throughout the book.
Every program begins as a specification, a statement of what it will do for its user. Here is the specification for our command. The command syntax will be:
FDUMP fileref
where fileref is the explicit designation of a file. Throughout the book, the word "fileref" will mean a complete reference to a file: the drivecode (a letter and a colon), the filename, and the filetype. A fileref that contains a question mark or an asterisk is "ambiguous," while one that does not is "explicit."
If the file named by fileref exists, FDUMP will display its contents at the terminal in physical record units. Physical records are 128 bytes long. We'll use those units because we programmers are often interested in physical records, or in the relationship between data records and physical records.
If the file doesn't exist, FDUMP will display an error message, and terminate. The error message will be "fileref?". That's what the TYPE command says in the same circumstances.
Most terminals are only 80 characters wide, so the display of a physical record will have to be folded somehow. Let's specify that each record will be displayed in two screen lines of 64 bytes each. There will be a blank line between each pair of screen lines to separate the records.
The record number, counting from zero, will be displayed at the end of the first line of each pair. We don't want to tackle decimal conversion yet, so we'll display the number in hexadecimal.
Where the file contains unprintable characters FDUMP will usually display a dot. That will ensure that every byte will appear on the screen, yet the terminal won't be confused by random control characters. The two ASCII characters CR and LF (carrier return and linefeed) will be displayed as back-slashes (\:BS\). The pair CR, LF mark the end of a line in a normal, ASCII file and we want them to stand out.
With that specification in mind, I drew up a high-level sketch of FDUMP in the form of pseudo-code. It took three tries to arrive at a sketch that satisfied me (Figure 1-1 [1] ). The point of this exercise was to reveal the shape of the program, and to expose the main decisions that had to be made. Each line of Figure 1-1 represents a small cluster of decisions that have to be made before the program is complete.
Figure 1-1. High-level sketch of the FDUMP command.
program FDUMP (fileref) open file "fileref" if it doesn't exist, abort with "fileref?" initialize reading mechanism initialize record number while not end of file... display a blank line. read and display the first 64 bytes display the record number in hex end the first line read and display the second 64 bytes end the second line step the record number end while. end FDUMP.
The sketch of FDUMP calls for four initialization steps. Let's examine those lines and see whether or not they require more elaboration before they can be coded.
Before a CP/M command is called, its first operand is set up in a File Control Block (FCB) in low storage. That makes it easy to open the file named by the operand, so the first line of Figure 1-1 needn't be expanded further. The Open File service request returns a signal if the file doesn't exist. The first operand is also available in low storage as the user typed it. I decided it would be simple to make that into a message, so the second line could be let stand as well. As we'll see, it wasn't that easy after all.
Initializing the record count really is easy: the starting number is stored in a variable. But what of the "reading mechanism"? There will be one, but it hasn't been designed. Its initialization will have to be deferred until more decisions have been made.
The main part of Figure 1-1 is a while-loop. The condition that will end the loop is the arrival of the end of the file. The program will deal with one physical record during each pass through the body of the loop. This loop was based on the assumption that the program would use the simplest of sequential I/O techniques. I assumed that there would be a subroutine that would either deliver the next byte from the input file, or return a signal that there were no more bytes to be had.
A CP/M file can be completely empty. Thus the end of the file might be found on the very first call of the subroutine. For that reason the main program loop has to be stated as "while not end of file" rather than as "until end of file."
The contents of the loop are just a restatement of the specifications. We needn't elaborate the line "step record count," but the other lines deserved more detailed design. I elaborated Figure 1-1 into the more detailed plan shown in Figure 1-2.
Figure 1-2. Elaboration of the main loop of FDUMP.
program FDUMP (fileref) C is a byte RecNum is a word EOF is a flag of some sort main: open file "fileref" if it doesn't exist, abort with "fileref?" initialize reading mechanism initialize record number RecNum := 0 C := Getchar { EOF true if end of file} while( not EOF ): TypeCRLF {type a blank line: CR, LF} read and display the first 64 bytes do 64 times Output(C) C := Getchar end do. display the record number in hex ShowRecord TypeCRLF {end the first line} read and display the second 64 bytes do 64 more times: Output(A) C := Getchar { EOF on 64th (128th) } end do. TypeCRLF {end second line} step record number RecNum := RecNum+1 end while. end main.
Pseudo-code is simply a style of making notes on one's plan for a program. It is meant for use by people, not by machines, so it can use any notational conventions whatever. Figure 1-2 is written in my personal pseudo-code style. It is an amalgam of English sentences and forms taken from programming languages, primarily Pascal. You are free to develop your own pseudo-code style, one that suits your taste and the way you think. However, in this book you will have to make sense of my style. Figure 1-2 is the first real example of that, so let's look at the conventions I use.
The Figure opens with a heading, "Program FDUMP(fileref)." Every program and procedure has such a heading to indicate its name and what sort of arguments it will need.
The first lines after the heading define the variables that are used in the unit. At this stage of a design we don't know, and don't care, how those variables will be implemented. The line "RecNum is a word" is a note that says "there will have to be a 16-bit counter to hold the record number, and let's call it RecNum." RecNum might turn out to be a word in storage, or it might become a pair of machine registers. That decision is deferred until we begin coding the assembly language. The name "C" is also the name of a machine register; its use in the line "C is a byte" suggests that we are anticipating the code and thinking that the byte might well be carried in the C register.
The line "initialize the record count" from Figure 1-1 has been elaborated with the note "RecNum:=0." I use the Pascal form ":=" to mean assignment to a variable. The next line, "C:=GetChar," might be confusing. Unless you are a practiced programmer, you probably won't guess that GetChar is meant to be a subroutine, and that it will return a byte when it is called. Sneak a look ahead at Figure 1-3 where GetChar is defined. Its heading says "returns a byte and a flag." A subroutine that returns a value can be used on the right hand side of an assignment.
The while statement of Figure 1-1 has been refined into "while (not EOF)." That is shorthand for "while the EOF flag—set by GetChar—does not contain the signal TRUE." Conditional tests of this sort appear in while statements, if statements, and others. I always enclose these conditional expressions in parentheses.
There are some comments in the lines that form the body of the while-loop. They are enclosed in braces. The whole body of the while-loop has been indented. Groups of statements that are under the control of some statement at a higher level are always indented. Notice how the body of the "do 64 times" loop has been indented under it. Notice, too, that every such compound statement (while and do, in Figure 1-2) is framed with an end-marker ("end while" and "end do" in Figure 1-2). There is a compound if-statement in Figure 1-3; it follows the same pattern.
As the logic of the main loop grew, the specification of the "reading mechanism" became clearer. It would consist of a subroutine (called "Getchar" in Figure 1-2) that would deliver the next byte, and would set a flag when the end of the file appeared.
I had trouble convincing myself that the flag was really necessary. It is unaesthetic to have a subroutine return two different objects (a byte and also a flag). When a program reads only ASCII files, the flag is not necessary, as we will see in later chapters. FDUMP, however, is not limited to ASCII files. It must display every byte of a file of any kind, and the bytes might have any value. Therefore, there is no byte value available to stand for end of file. If we were aiming at a high-level language program we could specify that Getchar should return an integer with greater precision than a byte. Then we could specify some integer value to stand for end of file without preempting one of the 256 byte values. Here we are stuck with requiring Getchar to return both a byte and a flag. When the flag is true, the byte will of course be meaningless.
Other decisions went into the making of Figure 1-2. There would be a subroutine (Output) to display a byte, and a subroutine (ShowRecord) to display the record number. (In truth, the need for ShowRecord didn't become clear until work had begun on CDUMP and VDUMP, the next two programs.)
As Figure 1-2 took shape it became clear that EOF could only be true at one of two points. If the file was empty, EOF would be true on the first call to Getchar. The loop would never be executed. Otherwise, since CP/M files consist of 128-byte records, EOF could only become true on the very last call to Getchar in the second inner loop. I made a mental note that, if EOF were to be held in a machine register, it must be one that would not be modified during the actions "end 2nd line" and "step record count." Otherwise, the flag wouldn't be available at the top of the loop where it is tested.
The reading mechanism could now be worked out and added to Figure 1-2. The changes are shown in Figure 1-3. There will be a 128-byte buffer and an index over it. At the start of the program, CP/M must be told that our buffer is the one to use in file operations. The index must be set so as to force GetChar to read a record the first time it is called.
Getchar will return a byte from the buffer where the index points, and advance the index. If the index is already past the end of the buffer when GetChar is called, it will try to read the next record.
Figure 1-3. Additions to Fig. 1-2 to implement the reading mechanism in FDUMP.
program FDUMP (fileref) C is a byte Record is a word EOF is a flag of some sort Index is a count 0..128 Buffer is an array[0..127] of bytes—a CP/M record main: ... initialize reading mechanism make Buffer the file buffer to CP/M Index := 128, i.e. no data in buffer. .... end main. Getchar : returns a byte and a flag T is a temporary byte. if (Index=128) then do Index := 0 read a record to Buffer endif. if (end of file) then return nonzero flag, any byte. T := Buffer[Index] Index := Index+1 return zero flag and byte T. end Getchar.
It only remains to specify the subroutines Output and ShowRecord. The pseudo-code I arrived at is shown in Figure 1-4. Output begins by turning off the most significant bit of the byte to be displayed. That is done because some word processor programs set the high bit of ordinary ASCII characters to signal a print formatting function, and some assembly programs use an ordinary character with its high bit set on to indicate the end of a message string. If the byte is an ASCII control character, Output calls on a function Control to translate it.
It is in Control that we implement our decisions on how to display control characters. The pseudo-code of Control makes use of an if-elif-else structure. "Elif" is just shorthand for "else if". A stack of "elif" statements is a handy way of expressing the process of making one choice from several alternatives.
Figure 1-4. The console output design of FDUMP. TypeChar and TypeXXXX are included subroutines.
{—- Display a byte at the console—- } Output( X: a byte ) turn off high bit of X if (X is not printable) then X := Control(X) Typechar(X) end Output. Control( X: an unprintable byte ) returns a byte. if (X is CR) then return "\:BS\" elif (X is LF) then return "\:BS\" else return "." endif. end Control. {—- Display the record number—- } ShowRecord: TypeBlank; TypeBlank {type two spaces} TypeXXXX(RecNum) {type four hex digits} end ShowRecord
Output and ShowRecord call on subroutines that aren't shown here. Typechar, TypeXXXX, and other routines are taken from libraries that will be merged with the final program using the INCLUDE command that we will develop in Chapter 6. Typechar takes a byte and sends it to the CP/M console device. TypeXXXX takes a 16-bit value and sends a 4-digit hexadecimal display of it to the console.
The design of the program is now complete; all the important decisions have been made and noted in pseudo-code. With Figures 1-2, 1-3, and 1-4 in hand, I set out to write the assembly language code that would implement the program.
Coding the program was not a mechanical process; I couldn't just "compile" the pseudo-code as a Pascal compiler would compile its input. There were many little decisions still to be made: labels, register use, the choice of the shortest instruction sequence for each task, the best wording for comments. These, however, were small, local decisions. At each particular moment I could concentrate on a span of instructions no greater than the size of a sheet of notepaper. All the global, algorithmic, decisions had been made and recorded in the pseudo-code. The local coding decisions that remained were easy to make in isolation.
Most of the difficulty associated with using assembly language comes from trying to handle local decisions and global ones at the same time. If you try to work out the shape of a main loop and the best use of the stack at the same time, you will probably do a poor job of both. By doing a careful algorithmic design first, and recording it in some design notation like pseudo-code, you separate the two levels and can do both well in a reasonable amount of time.
The assembly code of FDUMP appears in Listing 1-1. Its style and its general outline is typical of all the programs in the book. Let's examine the conventions it uses. We'll return to its content later.
The source lines are all 64 characters or less in length. That is because some terminals only display 64 characters on a line. The lines in Listing 1-1 are 72 characters long because they have been processed by the MACREF program developed in Chapter 7. It added a sequence number and a tab to the head of each line. It also appended a cross-reference of symbol names at the end of the Listing.
The program opens with a descriptive prologue. That, and the main divisions of the program, are set off with lines of equals signs. Each subroutine has its own descriptive header, set off by lines of dashes (see GetChar at line 66, for example).
Most of the code is written in lowercase letters. That's my personal stylistic choice; it makes no difference to the meaning of the program. The assembler treats all letters as uppercase. I prefer the look of lowercase. Labels and variable names are the proper nouns of a program; I capitalize those. Names are often quite long. The MAC assembler allows names to be as long as sixteen characters; why not take advantage of it? Often these symbols are formed from two or more words, and I use capital letters to make that obvious, as in "MainWhile." This, too, is pure decoration; the assembler would treat "Main," "MAIN," and "maiN" as identical words.
Lines 24, 28, 46, and others contain macro calls. Names of macros are always capitalized so that they are distinct from ordinary machine opcodes.
Now let's look at the content of the FDUMP program. The first two statements after the prologue (lines 14 and 15) cause the assembler to read the contents of two macro libraries, CPMEQU.LIB and PROG.LIB. These files are listed in Appendix B. The first one contains equate statements that give names to common CP/M storage locations and useful constants. These names are used over and over, so you might glance at CPMEQU.LIB in Appendix B now.
The PROG macro library contains macros related to program organization. PROLOG is the most important one. That macro is called on line 16; it prepares the execution environment for the program. It sets up a stack and calls Main, the starting point of the program. It generates an error-termination routine that is usually called through the ABORT macro. ABORT provides a way of terminating a program with a message to the console.
The main code (at label "Main," line 23) opens with a use of the SERVICE macro. That macro performs a call of the CP/M BDOS. It preserves all registers except A and, when the BDOS will return an address, HL. SERVICE allows us to insert BDOS service requests anywhere in the code without upsetting our careful assignment of values to registers. The first operand of SERVICE is the number of the request. Here it is given as "BdosOpen", a name defined in CPMEQU.LIB to stand for the number of the Open File service. The second operand of SERVICE is the argument that is to be passed to the BDOS in registers DE. Here it is the address of the default FCB, where the Console Command Processor (CCP) will have placed the first operand of the command. If that file does not exist, the BDOS will return FFh in register A. Otherwise, A will contain 0, 1, 2 or 3.
The rest of the main routine is a direct translation of the pseudo-code. After some false starts, I decided to assign EOF to register A. That made it easy to write Getchar, but there remained the problem of preserving EOF from the last call of Getchar to the top of the loop. If the library routine TypeCRLF changed register A, the loop might end too soon or not at all.
Register preservation is a constant problem in assembly language. One great boon of a high-level language is that the compiler manages all the niggling problems of register and stack management. If these problems are not firmly controlled, the resulting worries and bugs can drain the programmer's mental energies quickly.
Therefore, I established an inflexible rule that applies to every subroutine in this book: A subroutine will preserve all registers except those that are specifically defined to contain the subroutine's result. The library routine TypeCRLF, for example, is meant to send CR and LF to the console. Since it returns no result, it will change no registers. The rigid application of this rule may have cost a few pushes and pops that were unnecessary in certain circumstances, but the saving in programmer time was immense.
Getchar in assembly language (lines 66-95) is logically the same as Getchar in pseudo-code, but that isn't immediately apparent. I used some mildly tricky coding to make Getchar smaller. The tricks are simple, and interesting as examples, but they demonstrate how easily assembly language can become dangerous.
Compare the two forms of Getchar. Notice that the pseudo-code line "Index:=0" isn't represented in the assembly code. Index is loaded into register A on entry. If the buffer is not empty, Index will be less than 128, the high bit of A will be 0, and control will pass to Getchar2. Eventually A will be stored back to Index. But what if the buffer is empty? Then, as a result of a successful BdosRead service request, A will contain zero. That is the desired value of Index after a read. The code takes advantage of this fortuitous event to save one 1-byte instruction ("XRA A" to clear the register). The assignment "Index:=0" vanishes and the code becomes more obscure. It's very satisfying to discover these little optimizations (but does that satisfaction compensate for the obscurity they introduce?).
Getchar demonstrates another coding trick. The code assumes that Buffer is located on a page (256-byte) boundary. Therefore a 16-bit address can be formed by concatenating the most significant eight bits of Buffer's address to the eight bits of Index. The danger here is that the assumption might not be justified. Buffer is defined at the end of the program. We might forget, by that point, to put it on a page boundary. Then Getchar would return garbage. This is an example of a general class of assembly language dangers. The code depends for its correct operation on a characteristic of another, unrelated, part of the program. The only link between the routine and the supporting characteristic lies in the programmer's fallible memory. It is all very well to document the assumption in a comment, as was done in the heading of Getchar, but such notes are easily overlooked.
The Output subroutine (lines 96-113) takes advantage of the 8080's design in a nicer way. It uses the conditional call instructions to call the Control routine only when translation is needed. These instructions are especially satisfying ones to use, and they rarely cause a problem, provided that the call is adjacent to the instruction that sets the flags it tests.
FDUMP terminates with a message "fileref?" when the input file cannot be found. The design of the code to do this was deferred, since it appeared to be very simple. The code that I came up with follows the label NoFile in the assembly program. It works, but in fact it contains two bugs! They follow from the unpredictable nature of user input.
The Console Command Processor (CCP) leaves behind in storage a character string that the CP/M documentation calls the "command tail". This consists of all the characters of the command that followed the command verb. The characters are just as the user typed them, except that lowercase letters have been translated to uppercase. The NoFile code converts those characters into a message by appending a question mark and a dollar sign to the end of the string. The question mark is part of the message. The dollar sign delimits the message for the Console Output String service request.
The message consists of whatever the user typed as command operands. If you give the command "fdump what's your name," and assuming there is no file named "what's," then FDUMP will reply "WHAT'S YOUR NAME?" That is not really a bug. If the user types a lot of spaces or tabs ahead of the filename, FDUMP's message will repeat them. That will make the command look simple-minded, so it might be classed as a mild sort of bug. But what about the command "fdump $$$.sub," when there is no file $$$.sub? The dollar sign in the filename will terminate the message; FDUMP's message will consist of a single blank. That's really a bug.
I left this unsuccessful piece of code in FDUMP as an illustration of how an apparently simple idea can go badly awry in practice. You might enjoy trying to correct the problems of the NoFile code. In the next chapter we'll just display "No File" when the input file doesn't exist.
Several subroutines named in FDUMP don't appear in the listing. The labels TypeCRLF, TypeChar, TypeBlank, and TypeXXXX are called at various points, but they aren't defined in the program. Where are they?
These subroutines are defined in a different file, TYPESUBS.INC. They were inserted into the program before it was assembled by running the code through the INCLUDE command developed in Chapter 6. Line 170 of Listing 1-1 says "#include TypeSubs.inc,TypeCommon." When the INCLUDE command read that line, it located the file TYPESUBS.INC, extracted from it a unit of text named TYPECOMMON, and inserted that text into FDUMP. You can read the text of TYPECOMMON in Appendix C; it defines the subroutines named TypeCRLF, TypeBlank, TypeMessage, and TypeChar.
The include files shown in Appendix C contain over fifty useful subroutines. By putting them in separate files and including them where they are needed, the programs themselves were made shorter and simpler.