Previous Contents Next

CHAPTER 5: READING THE DIRECTORY

One of the most interesting files on any disk isn't a file at all. It is the directory of files. In this chapter we will see how to read the directory in a safe, hardware-independent way, and we will experiment with ways to process the directory entries after reading them. Before doing so, we will examine the contents of the directory, and how they are used in file space allocation.

The Disk Directory

The disk directory is at the heart of the file system. In it, the BDOS records the names of all files and the location of each file's data on the disk. In CP/M 2, each file's "attributes"—its read-only or read-write status, and other flags—are recorded along with its name and type. In MP/M 2 the directory contains password and timestamp information about some files, and may hold a label record to describe the disk as a whole. That will presumably be true in CP/M 3, whenever that long-awaited system arrives.

Almost everything there is to be known about a disk's contents can be determined from the directory. The displays produced by the STAT and DIR commands only summarize this information. An endless number of other commands could be written to display and analyze the directory's contents, if a command could only read it. And, in fact, it is quite easy to read the directory. Before we demonstrate how, let's go over the contents of a directory entry and its use.

THE DIRECTORY ENTRY

A single directory entry is a 32-byte data structure, laid out as shown in figure 5-1. Notice the similarity to the layout of the FCBs that we have used in the preceding programs. There are three parts to the structure: the search key, the reserved bytes, and the data map.

Figure 5-1. The format of a CP/M or MP/M disk directory entry.

a directory entry is a structure composed of: a search key composed of: the user number / activity code is a byte of: a user number from 0 to 15 (00h to 0Fh), or 249 (E5h) to signal an erased entry, or 16 plus a user number in an XFCB (MP/M 2), or 32 (20h) to signal a disk label (MP/M 2). the filename is 8 bytes of the filename in ASCII, the filetype is 3 bytes of the filetype in ASCII, the extent number is a byte from 0 to 31 (00h to 1Fh), reserved bytes composed of: s1 and s2 are bytes reserved by the system, the record count, a count of 128-byte records allocated to this extent, the data map, 16 bytes of: 16, 1-byte allocation block numbers, or 8, 2-byte allocation block numbers.

The Search Key

The BDOS treats the directory as a keyed, direct-access file. When a program needs access to a file, it calls the BDOS and passes an FCB which contains the characters of the filename and filetype. These characters comprise the bulk of a search key. The BDOS searches the directory until it finds an entry whose filename and filetype match the FCB; it uses that entry to satisfy the program's request.

The full search key contains two additional bytes, one at the most significant end of the key and one at the least significant end.

The User Number

The most significant byte of the search key in a directory entry is either an activity code or a user number, an 8-bit integer. The BDOS maintains a current user number, which can be set with the USER command. The current user number is an implicit parameter of all directory searches. The BDOS appends the current user number to the head of the fileref used to search the directory. Only entries that start with that number can be found during the search. If the current user number is, say, 5, then only entries that begin with a user number of 05h can be found when a file is to be opened. The current user number is written whenever a new directory entry is created. The effect is to divide the directory into as many as sixteen isolated sub-directories.

The Activity Code

The same byte serves as an activity code, to distinguish inactive directory entries from active ones. The directory contains a fixed number of slots for entries. Usually, there will be a number of unused slots. Some of these will never have been used, while some will once have recorded files that have since been erased. Never-used entries are not returned by the access method we'll be using, but erased entries may appear. They are signalled by the presence of an activity code of 249 (E5h).

The MP/M 2 operating system supports file passwords and file-access timestamps. If a file has a password or timestamps, they are recorded in a directory entry called an "XFCB." The key of an XFCB contains the same fileref as in the other directory entries for that file, but its user number is increased by 16 to signal the fact that its contents are different from those of a normal directory entry.

MP/M 2 also supports a disk label, a single directory entry that describes the disk as a whole. It, too, has a key. Its fileref is a name for the disk, and its user number contains 32 (20h) to make it unique from all other entries.

The Extent Number

A single directory entry can describe the locations of only a limited amount of data. If a file takes up more space than a single entry can describe, the BDOS creates additional entries for the file, each one describing an extent of the file's data. It must be possible to find each entry separately, and to tell the correct order of the extents. The BDOS makes these things possible by appending another byte, the extent number, to the search key. A file's first extent is given extent number 0, the second is given the number 1, and so on. The BDOS searches the directory using the full key: user number, fileref, and extent number.

SPACE ALLOCATION

Disk Formats and Hardware Independence

Data on a disk is not recorded as a continuous stream; the drive hardware divides the disk surface into concentric tracks and divides the tracks into arc-shaped sectors. In CP/M's early days, it supported only 8-inch diskettes. It required these to be formatted into 77 tracks with 26 sectors on each track, each sector holding 128 bytes of data.

There is now a great variety of disk formats. As personal computers proliferated, hardware manufacturers produced disk drives of all kinds. Five-inch diskette drives appeared; these had 35 or 40 tracks. Double-density drives appeared; they allow sector sizes of 256, 512, or 1024 bytes. Double-sided drives (of both diameters) record on both sides of the disk, doubling the number of available tracks. Hard disks are becoming common; they usually have several recording surfaces with a hundred or more tracks on each.

It is usual to skew the location of sectors around the track. That is, the sectors are actually written in a different physical order from their logical sequence. When the sector that is logically "next" comes several sector positions after the one that is logically "current," the system is more likely to be ready for it when it rotates under the head. Every disk configuration calls for a different skew pattern.

It would be impossible to account for this variety in the logic of our programs. Every program would be device-dependent! The great blessing of an operating system is that it can hide all the quirks of the hardware and present our programs with a single, consistent interface to data. That is what the CP/M 2 file system does.

But the BDOS is itself independent of the hardware. It gives the responsibility for dealing with the disk format to the BIOS, which is supplied by the hardware vendor. The BIOS is expected to provide the BDOS with a view of the disk in terms of some number of allocation blocks, each of which contains a fixed number of 128-byte records. The BDOS numbers these allocation blocks from zero up to the capacity of the disk. It reserves the first few allocation blocks to contain the disk directory. The rest of the blocks it portions out to files as they are created, and it records the blocks' numbers in the directory entries of the files that own them. In order to process the directory entries, we have to understand how this allocation is done.

Figure 5-2. The layout of the Disk Parameter Block, the area in which the BIOS tells the BDOS about a disk.

a Disk Parameter Block (DPB) is a structure composed of: SPT, a word, the number of 128-byte records per track, BSH, a byte, the base-2 log of the number of 128-byte records in an allocation block, BLM, a byte, an and-mask used to convert a record number to a record position within an allocation block, EXM, a byte, an and-mask used to relate logical (16KB) extents to physical extents, DSM, a word, the highest application block number on the disk, DRM, a word, the highest directory entry number on the disk, AL0 and AL1, bytes used to prime the allocation vector so that the directory blocks are reserved, CKS, a word, the number of directory entries to be validated by a check-sum when a disk is accessed, OFF, a word, the number of reserved tracks that precede the directory track on the disk.

The Disk Parameter Block

The BIOS is aware of the hardware characteristics of each disk. It is responsible for giving the BDOS the information the BDOS needs if it is to do file allocation. That information is passed in a structure called the Disk Parameter Block (DPB), whose layout is shown in Figure 5-2. Programs that will read the directory must also look at the DPB. There is a BDOS service request that returns its address.

The author of the BIOS makes two crucial design choices when deciding how to support a given disk format. They are the size of an allocation block, and the size of the directory. Those two choices are crucial because they affect the way the BDOS allocates space and the contents of the directory entries. An allocation block may have any of five sizes: 1,024 bytes, 2,048 bytes, 4,096 bytes, 8,192 bytes, or 16,384 bytes. The directory may occupy from one to sixteen allocation blocks. These choices have nothing to do with the physical format of the disk. The disk's sector size, and its number of tracks, are entirely the concern of the BIOS.

The choices are made for each disk format that the BIOS supports. The BIOS may support more than one format. In it does, it determines the format of a particular disk at the time the disk is first used after a warm start.

The two sizes are represented indirectly in the Disk Parameter Block. The BSH field expresses the allocation block size; if the number 128 is doubled BSH times, the size of an allocation block results. The DRM field contains the highest possible directory entry number, counting from zero. DRM+1 is the number of directory entries; (DRM+1)/4 is the number of 128-byte records allocated to the directory.

A given disk will have a certain capacity; that capacity is given in the DSM field. It contains the highest possible allocation block number, counting from zero. DSM+1 is the number of allocation blocks on the disk; DSM+1 doubled BSH times is the number of 128-byte records that the disk will hold.

The BDOS asks the BIOS for disk operations in terms of a track and a 128-byte record on that track. The SPT field gives the number of 128-byte records that each track holds. The CP/M documentation refers to this as "sectors per track," a term that is a relic of the days when a sector was always 128 bytes long. The BDOS neither knows nor cares how many physical sectors there are on a track, only how many 128-byte records there are.

The Directory Entry Data Map

Now let's return to the directory entry. The second half of an entry is a 16-byte field in which the BDOS lists the allocation blocks that belong to an extent of a file. This data map is simply an array of binary integers. When the disk can hold no more than 256 allocation blocks (DSM is 255 or less), any block number can be stored in a byte. On such a disk, the data map is an array of sixteen, 1-byte, allocation block numbers.

Nowadays it is common to have disks that hold many more than 256 allocation blocks. On these disks a 16-bit integer is required to hold a block number, so the data map consists of eight, 2-byte, block numbers. Before a program can interpret a directory entry, it must look at the DSM field of the DPB in order to find out which data map format is in use.

The Allocation Vector

The directory contains the numbers of all the allocation blocks that are in use. In order to allocate space to new files, the BDOS has to know which block numbers are not part of any file. It does this by building an allocation vector when the drive is first selected for use.

The allocation vector is a bit-map, a series of DSM+1 bits each of which stands for an allocation block. When it "logs in" a disk, the BDOS clears the allocation vector to zero. Then it copies the AL0 and AL1 bytes from the DPB into the head of the allocation vector. These bytes contain 1-bits for each block allocated to the directory. They prime the allocation vector to show that the directory blocks are in use.

Then the BDOS reads the entire directory. It looks at the data map in each entry, and for every non-zero block number, it sets the corresponding bit in the allocation vector to 1. The allocation vector then shows the state of all allocation blocks on the disk. Every 0-bit stands for an unused allocation block. There is a service request that will return the address of the allocation vector so that a program may inspect it.

Allocating Space

Let's think about what happens as a program, our utility PACK for example, creates a new file. The program first requests the BDOS to "make" the file. This is really a request for the BDOS to create an initial directory entry for that file. The BDOS reads the directory, looking for the first unused entry. Into that entry it copies the current user number and the filename, filetype, and extent number from the FCB. (If the extent number isn't zero at this time, the program will have made a directory entry for some later extent of the file, not for the first extent. This is not usually what it wants to do.)

The new directory entry has a data map that is filled with zeros. There is a block number zero but it is always used for the directory, so the number zero in a data map means "no block." The record count bytes in both the directory entry and in the program's FCB are set to zero. The new directory entry is then written to disk. If the program were to stop at that point, the file would exist but STAT would show that it contained no records.

Then the program begins calling on the BDOS to write records to the file. On each call, the BDOS first examines the record count byte in the FCB (actually, it examines the "current record" byte following the FCB, but the effect is the same for sequential output). The record count, modulo the number of records in an allocation block, gives the position of the next record within the current allocation block.

If the record position is zero, then it is time to allocate a new block to the file. The BDOS chooses one by scanning the allocation vector for the first 0-bit. Its position gives the number of the outermost free allocation block on the disk. That number is recorded in the data map in the program's FCB; it is thus allocated to the file. The allocation is not yet permanent because it has only been noted in the FCB in storage, not in the directory entry on disk.

Once it knows the allocation block number and the record position within it, the BDOS is in position to write the data. It uses the SPT field of the DPB to compute the track number and a record number on that track. Then it calls on the BIOS to write that record.

Multiple Extents

If the program keeps writing records, the data map of the FCB will eventually fill up with block numbers. When it is full, the BDOS retrieves the directory entry it made. It copies the data map from the FCB to the directory entry and writes the entry back to disk. The blocks allocated in this first extent are now permanently recorded. Then the BDOS increments the extent number in the FCB and performs another "make" operation for that extent, just as if the program had requested one. Writing can continue.

Because the BDOS always uses the first free directory entry, the entries for a file will almost always appear in extent number order in the directory. They could only be out of sequence if the program erased a second file in the midst of writing the first one, or if it created extents out of sequence using the direct-access services. We will make use of this fact later.

The extent number byte is restricted to a maximum value of 31, allowing 32 distinct extent entries. Once upon a time, 32 extents was the largest possible CP/M file. That is no longer true. How are higher-numbered extents recorded? We can't tell for sure, because the CP/M documentation doesn't say. It seems likely that the overflow is carried in the "s2" byte, because that is cleared to zero on a call to the Open or Make services. However, this is only an academic question. Even if we found out, it wouldn't be smart to base a program on an undocumented feature.

The Record Count

There's one more field of the directory entry: the record count. As just described, it seems to be a simple count of the number of 128-byte records controlled by the entry. That is the case, provided that the entry can account for no more than 128 records (16K bytes). However, the record count is never allowed to run beyond 128; a record count of 80h is the highest you'll ever see. That was the most records that could be recorded in an extent before CP/M 2, and the limit was kept for compatibility. However, suppose that we have a disk that uses a block size of 4,096 bytes and a data map size of eight? Then an entry can control 32K bytes, or 256 records. How is the record count handled in this case?

The relationship between the record count and the last-allocated block is retained. The count, modulo the number of records in a block, gives the record position within the block. When the extent grows past 16K, the record count starts over from zero. So the record count is really a count of records in the last (or only) 16K bytes controlled by the entry. This quantity of 16K bytes is sometimes called a "logical extent," in contrast to the directory entry which controls a "physical extent," or some multiple of 16K bytes.

THE DDUMP PROGRAM

We now have enough background to make sense of directory entries, if we could only see them. Program DDUMP will display them. Here are its specifications.

The form of the DDUMP command is

DDUMP drivecode

where drivecode is a drive letter followed by a colon. If drivecode is omitted, DDUMP will use the default drive.

DDUMP reads the directory entries on the disk in the given drive. It displays them at the console in hexadecimal. Each entry is displayed in two lines. Each line shows sixteen bytes of the entry. The filename and filetype are shown in ASCII at the right of the first line. The second line is the data map.

Designing DDUMP

DDUMP is not a complicated program. The initial plan for it appears in Figure 5-3. The only thing unspecified is how the directory entries are to be gotten.

Figure 5-3. Initial plan for program DDUMP.

Program DDUMP( d: ) if a drivecode d: was given, make it the default drive set CpmFcb for an exhaustive directory search request the first directory entry while more entries remain display the current directory entry at the console request the next directory entry end while. end DDUMP.

Before CP/M 2, a program that wanted to see the directory had to call on the BIOS directly to read the sectors. This is complicated, and it is possible to hang the system if things are not done in the right order. This could be tolerated because, before CP/M 2, there were always exactly 64 directory entries in exactly 16 sectors, and those sectors were always skewed in the same pattern.

With CP/M 2, the directory size could range from 32 to 8,192 entries, and the skew pattern became unpredictable. The designers of CP/M 2 and MP/M added BDOS service requests for reading the directory, so as to remove the need for calling the BIOS directly for disk input.

There are two service requests, but they operate in almost identical ways. The Search First service takes an FCB and returns the first directory entry that matches it. The Search Next service takes an FCB and returns the next matching entry after the last one retrieved. These requests accept ambiguous filerefs (question marks are allowed) and also allow the extent number to be specified as a question mark. Since the extent number in the FCB is part of the search key, a program may search for a particular extent entry, or for all extents one after another.

Normally, the Search services return only active directory entries for the current user number. In other words, they call for a search of the directory in the usual way the BDOS does it, using the full search key. However, they have another mode of operation. If the FCB drivecode byte (which is in the same position as the user number in a directory entry) contains a question mark, the search key is ignored and the requests return all directory entries: those with any user number, any fileref, any extent number. This is the mode in which DDUMP will use the requests. It will dump all entries to the console for inspection.

The search requests return a 32-byte directory entry at some offset into the current file buffer. The offset is either 0, 32, 64, or 96 bytes. The reason for this is that the BDOS usually reads an entire 128-byte record from the directory into the file buffer. The wanted entry is one of four in that record. However, this is not a fact to be depended upon. The BDOS does not have to return an entire record, just a single entry. When the CP/NET network code is active, the entry may be read from a disk located on a different machine some distance away. In that case, the system may return only the wanted entry in order to save transmission time.

A complete pseudo-code statement of the logic of DDUMP appears in Figure 5-4; it shows how the Search requests can be handled. The source code of the program appears as Listing 5-1. When you run it, watch the entries for files that have been made read-only. Can you see the difference?

Figure 5-4. A more formal statement of the plan for program DDUMP.

Program DDUMP( d: ) P is a pointer to a directory entry if d: was given, BdosSetDrive(d:) CpmFcb drivecode := "?" for exhaustive search P := FirstSearch while (P<>nil) Display(P) P := NextSearch end while. end DDUMP. FirstSearch : returns a pointer to a directory entry A is a byte A := BdosSearch1(CpmFcb) return MakeAddr(A) end FirstSearch. NextSearch : returns a pointer to a directory entry A is a byte A := BdosSearchN(CpmFcb) return MakeAddr(A) end NextSearch. MakeAddr( X : a byte ) : returns a pointer to a directory entry if (X = 255) then return nil else return (address of CpmBuffer)+X*32 endif. end MakeAddr.

THE DLIST PROGRAM

DDUMP does no processing of the directory entries. It just dumps them in their physical order. It is up the user to interpret them. Program DLIST will do better. First, it will sort the entries into order by their search keys, rather than showing them in their physical order in the directory.

DLIST will display the essential information about an entry in a single line, without the relatively unimportant hex dump of the data map. What is essential about a directory entry? The search key, first of all. DLIST must show the user number, the filename, the filetype, and the extent number. Then it might as well show the "s1" and "s2" reserved bytes, and it should show the record count.

The actual block numbers in the data map aren't much use, but the number of blocks allocated is important. DLIST will show the count of non-zero entries in the data map. It should also show the map size of eight or sixteen, so that we can tell when a map is full.

One other item is important. If a file is built with direct access rather than sequentially, it is possible to create allocation "holes." With direct access, a program may write any record it pleases. If it writes the first record and the 156th record, the BDOS will dutifully allocate the two blocks that contain those records. All other entries in the data map will remain zero. Zero map numbers will appear between non-zero numbers. These holes will cause a problem when the file is read sequentially, as by PIP. The BDOS will report end of file when it reaches the first hole. DLIST will inspect each entry for holes and put a flag in the display line.

The File Attributes

The fileref characters are ASCII, and as such their most significant bits are zero. These unused bits have been given use as file "attributes," binary states the file can be in. Two are in common use as the R/O and SYS attributes. The STAT command makes a file read-only by setting bit 0 of the first character of the filetype to 1 in each extent of the file. The file is given the SYS attribute—made invisible to DIR—by setting bit 0 of the second filetype character.

Bit 0 of the third filetype character is called the Archive attribute. It could serve a very useful purpose, but it lacks software support in CP/M 2 and MP/M 1. This attribute bit is set to 0 whenever a directory entry is modified. The intent is that an archival backup system could be written which would make backup copies of only those file extents that had changed since the last run. After copying the files, the program would set the Archive attribute to 1 (there is a service request that will set attribute bits in the directory). Since most files change little once they've been created, such a scheme would economize on backup storage. That's important when backing up a large, hard disk.

The first four filename characters have attribute bits which are not defined or tested by the system. The attribute bits in the second four filename characters are reserved.

The DLIST program should display all eleven attributes in some meaningful way, separate from the fileref. Finally, it should be able to perform a search for a specific fileref, as well as making an exhaustive search as DDUMP does.

Specifying DLIST

The DLIST command displays selected directory entries for inspection. Its form is:

DLIST search-ref /option

The search-ref is a fileref, which may be ambiguous. If it is omitted, DLIST assumes "*.*" and searches for all active directory entries under the current user number. The /option is a slash followed by an "x" for "exhaustive search." If it is present, only the drivecode of search-ref has any effect; DLIST displays all directory entries including erased ones.

If DLIST finds no matching entries, it reports "No file." Otherwise it types one line at the console for each entry found. The line has this form:

ua filename typ ex s1 s2 rc as ms h -attributes-

The "ua" field is the user number in hexadecimal. The ex, s1, s2, and rc fields are those bytes of the entry, in hexadecimal. The "as" field is a count of the data map entries, and "ms" is the size of the data map. The letter "h" will appear if the file has a hole in its allocation; the letter "c" appears if it is compact.

The file attributes are shown as a string of eleven characters. The letter "r" stands for R/O, "w" for R/W, "s" for SYS, and "d" for DIR. The archive attribute is indicated by a "u" for "unarchived" (i.e. modified and not backed up) or an "a" for "archived." The filename attribute bits are shown by the digits "1" through "8" where they are set on, or by blanks where they are not.

Designing DLIST

The first difference between DDUMP and DLIST is that the latter has to display the entries in sorted order. That implies that, instead of displaying each entry as it is found, DLIST has to store all the entries before it can display the first one. I decided to put the entries in order as they turned up, rather than sorting them all at once. This would allow me to store them in a simple linked list.

The basic logic of DLIST followed fairly easily (see Figure 5-5). Most of the initialization is concerned with getting the FCB set up properly. The exhaustive search works only on the default drive, so if a drivecode is requested it has to be made the default.

Figure 5-5. Initial plan for program DLIST.

Program DLIST( search-ref, /x ) if search-ref was omitted, supply "????????.???" if a drivecode was given, make that the default drive if the /x option was given set CpmFcb for an exhaustive search else set CpmFcb to search the default drive endif. find out the size of a data map (8 or 16) on this disk request the first matching directory entry if none exist, display "No file" and terminate repeat save the current entry in sequence by search key request the next matching entry until there are no more entries. for each entry found display information about the entry at the console end for. end DLIST.

In order to "find out the size of the data map," the program will have to locate the Disk Parameter Block and see if DSM is greater than 255.

There will be two main loops. One, similar to DDUMP's logic, will search for entries until there are no more, stowing each one in its proper sequence. Then, in another loop, the stored entries can be read out and displayed.

The count of blocks, and the hole-flag, are additional to the normal information in a directory entry. Then, too, a pointer-word will have to be added to each entry in order to link it to the next one in sequence. I specified a structure, the EntRec, to hold the entry and these other data (Figure 5-6).

Figure 5-6. Layout of the EntRec, a directory entry augmented with more information.

an EntRec is a structure composed of: EntNext is a pointer to the next EntRec in sequence EntMs is the size of the data map, 8 or 16 EntAs is the count of blocks allocated to the entry EntHc is a boolean, true if there are allocation holes EntEntry is a directory entry in full

Now DLIST can be specified more formally, in order to decide on the shape of its subroutines. Figure 5-7 shows the result. The First- and Next-search routines can come from DDUMP.

Figure 5-7. A more formal statement of DLIST, used to plan the organization of subroutines.

Program DLIST( search-ref, /x ) P is a pointer to a directory entry E is a pointer to an EntRec ... initialize ... P := FirstSearch if (P=nil) then "No file", terminate repeat StowEntry(P) P := NextSearch(P) until (P=nil). E := FirstEntry repeat Display(E) NextEntry(E) until (E=nil). end DLIST.

The next problem was how to store the entries in order. Linked lists are fun to work with (almost as much fun as binary trees—see Chapter 7). As long as one is careful to distinguish between a thing and the address of the thing, the algorithms come out simple and neat. The main function of StowEntry (Figure 5-8 and Figure 5-9) is to run down a chain of EntRecs until it finds the right place for the new one. Then it must break the chain and splice the new EntRec into it at that point.

Figure 5-8. The logic used to insert a new entry into the chain of EntRecs in reverse order, from high to low.

StowEntry( X : a pointer to a directory entry ) : returns a pointer to an EntRec. global Anchor is the anchor of a chain of EntRecs N, P, Q are pointers to EntRecs Stop is a boolean N := CopyUp(X) { build an EntRec from dir. entry } PrepEntRec(N) { augment it with more info. } P := address of Anchor Stop := false repeat Q := P->word { Q->next EntRec in the chain } if (Q <> nil) then { ..there is a "next" in the chain } if (N->EntRec < Q->EntRec) then { move on in chain } P := Q else { new one goes between P-> and Q-> } Stop := true endif. else { new one goes at end of chain } Stop := true endif. until Stop. P->word := N { predecessor points to new EntRec } N->word := Q { new one points to next, or to nil } return N end StowEntry.

Figure 5-9. Subroutines of StowEntry, called from Figure 5-8.

CopyUp( X : a pointer to a directory entry ) : returns a pointer to an EntRec globals NextFree is the address of the next free byte, StoreLimit is the address of the end of storage N is a pointer to an EntRec N := NextFree NextFree := NextFree + length(EntRec) if (NextFree > StoreLimit) then Abort "Too many entries" copy X->entry into N->EntRec.Entry return N end CopyUp. PrepEntRec( P : a pointer to an EntRec ) P->EntRec.Ms := data map size for the disk, 8 or 16 P->EntRec.As := count of non-zero map slots in the entry P->EntRec.Hc := true if there are allocation holes end PrepEntRec.

When you read StowEntry carefully (you're reading all these programs carefully, aren't you?) you will realize that it is building the chain backwards. It is putting the highest search keys at the head and the lowest ones at the tail! This is not a mistake.

We can assume that filerefs and user numbers are scattered pretty much at random in the directory, so either sorting up or sorting down would take the same amount of time. However, we can expect that when a file has more than one entry, the entries will be found in ascending order by extent number.

The job of comparing two search keys is a lengthy process. The bytes can't be compared until their attribute bits have been masked off. Otherwise the attribute bits would interfere with alphabetic order. When two extents of the same file are compared, the comparison will run all the way to the last, extent number, byte before a difference is found.

Suppose the chain were to be kept in ascending order. Then when stowing the third extent entry for a file, the program would have to compare it to the first entry for that file, then to the second entry, then to the next entry in the chain, before finding where to put it. But if the chain is kept in descending order, the higher-numbered extents of a file precede the lower-numbered ones. The third entry for a file goes into place with only one comparison to a sibling entry rather than three. By using descending rather than ascending order, we reduce the number of comparisons when stowing the entries of multi-extent files.

This optimization has only one flaw: it leaves the chain of EntRecs in the reverse of the order that the main routine needs. The solution is to reverse the chain after it is built. Can that be done quickly? It took an hour of doodling to find out that it can be done. It takes only one pass over the chain, and nothing moves but pointer values—see Figure 5-10. I repeat, linked-list algorithms are pure fun. Discovering this chain-reversing algorithm was an absolute delight. (Figure 5-10 contains the first use of the loop...while...end loop construct far. The meaning of it is as follows. The lines between "loop" and "while" are to be done. Then, if the while-condition is true, the lines from "while" to "end loop" are to be done, and control is to return to the "loop" line. When the while-condition is false, the loop ends.)

Figure 5-10. The logic used to reverse the chain of EntRecs so that the lowest fileref appears first.

FirstEntry : returns a pointer to an EntRec { reverses the order of the EntRec chain, then returns the first.} global Anchor is the chain anchor P, Q, R are pointers to EntRecs P := nil Q := Anchor if (Q <> nil) then { chain not empty, reverse it } loop R := Q->word Q->word := P while (R <> nil) P := Q Q := R end loop. end if. Anchor := Q return Q end FirstEntry.

The Display Machine

The DLIST program source appears as Listing 5-2. The source contains one thing that isn't suggested by the pseudo-code: a different approach to building a formatted display line.

The MACREF program (see Chapter 7) was written before DLIST. MACREF generates a lot of printed output, and the code that produces it was very tedious to write. When I first thought about DLIST and DFILE, I realized that they would both have to produce complicated output. I decided that something would have to be done to simplify the production of formatted output, something that would make the code more interesting both to write and to read.

I observed that the job of displaying the information from an EntRec could be reduced to a set of simple cases, each having the form: (a) pick up a byte from somewhere, (b) format it in one of a few ways, (c) type the result. Producing the display could be reduced to a few mechanical formulas. Why not design a "display programming language" and implement a simulator for it?

That is just what is going on in the Display routine of DLIST. The "display program" is a series of two-byte instructions to be executed by a "display CPU." If you want to rearrange the output of DLIST, all you need to do is to rearrange the "display program." The executable code of the routine is not dependent on the form of the display line, and it doesn't have to be changed when the display changes.

THE DFILE PROGRAM

DDUMP and DLIST are not really as useful as the other utilities in this book. They serve as examples, and as probes for experimenting with the directory. DLIST could serve as a basis for a more complicated directory display program, should you want to write one (few would, since there are several extended DIR commands in the public domain). But one more step will give us something of lasting use.

It's easy to read the directory. It's a lot more difficult to analyze and reformat the entries after reading them. You can probably imagine a lot of reports that could be produced—percentage of space used by filetype, maps of file data locations, and so on—if the data were only available to your favorite high-level language with its decimal arithmetic and easy output.

Well, let's use assembly language to capture the data, and then use a high-level language to analyze them. Data capture is the function of the DFILE program. It will read the directory and write a file with the same contents, a file suitable for input to a program in BASIC, Pascal, or PL/I.

Specifying DFILE

The DFILE command captures the data in directory entries and puts it in an ASCII file that can be read in BASIC with INPUT#, or in Pascal with READLN. The command form is:

DFILE output-ref search-ref /option

DFILE differs from other utilities in that the fileref for the output file comes first in the command, and it may not be omitted. That is because search-ref, the second operand, may be omitted. If their order were reversed, the program couldn't always tell which was which.

The search-ref operand controls the directory search. When the /option is omitted, DFILE will process only the directory entries under the current user number that match search-ref. If search-ref is omitted, DFILE returns all active entries under the current user number.

The /option consists of a slash and an "x" for "exhaustive search." When it is given, only the drivecode of search-ref matters, and DFILE returns all directory entries for all user numbers, including erased entries.

DFILE Output

DFILE creates output-ref, a file that contains two kinds of records. The first record contains information about the searched disk as a whole. This record is always present, even if no entries were found. It contains a count of the number of records that follow it, so that the reading program can tell how many records it will have to process. Table 5-1 specifies the contents of this record.

ItemFormatDescription
1."        "Dummy filename—eight blanks
2.,"   "Dummy filetype—three blanks
3.,255Constant 255 signals the disk information record — it never occurs as a user number.
4.,zzzz9Number of 128-byte records on a track, from the SPT field of the DPB.
5.,zzzz9Size of an allocation block in bytes: 1024, 2048, 4096, 8192, or 16384.
6.,zzzz9Total capacity of the disk in allocation blocks, including directory space.
7.,zzzz9Count of allocation blocks in use for files and directory.
8.,zzzz9Number of reserved tracks and the track number of the directory (usually 2 or 3).
9.,zzzz9Size of the directory in entries (there are four entries in a 128-byte record).
10.,zzzz9Count of records that follow in this file, from zero up the the size of the directory.
Format zzzz9 indicates a signed decimal number of 1 to 5 digits with leading zeroes suppressed.
The quotes and commas are omitted when the program is assembled with the BASIC option false.
Table 5-1. The contents of the disk information record written as the first line of output by DFILE.

Each record after the first one describes a directory entry. Table 5-2 shows the format of these records. Note that the number of fields varies depending on the size of a data map on the searched disk. Note also that, when reporting on a very large disk, some of the numbers produced by DFILE may be negative. A negative decimal number really represents a positive 16-bit integer greater than 32,767. The program that reads the file can handle such a number by converting it to a "real" variable, taking its absolute value, and adding 32,768 to it.

These record formats have been carefully designed for easy processing. The first three fields have the same format in both types of records. Since the format only diverges after the third field, the first three fields of either record type can be read by the same subroutine without causing an error. Since the first, disk-information, record contains a special user number it can be recognized as such. Therefore the contents of several DFILE output files could be concatenated into a single file and processed in a single run.

ItemFormatDescription
1."FILENAME"Filename: eight characters, padded on the right with blanks.
2.,"TYP"Filetype: three characters, padded on the right with blanks.
3.,zz9User number (0...15) or an activity code. 249 signals an erased file.
4.,z9Extent number, from 0 to 31. Higher extent numbers overflow to the S2 byte.
5.,zz9Reserved byte S1, usually zero.
6.,zz9Reserved byte S2, usually zero.
7.,zz9Record count, the count of 128-byte records in the last (or only) logical extent.
8.,0/1Read-only attribute: 0=R/W, 1=R/O
9.,0/1System attribute: 0=DIR, 1=SYS
10.,0/1Archive attribute: 0 if the file was built or altered since the last archive backup.
11-14,0/1File attributes f1 to f4, defined by the application that sets them.
15-18,0/1File attributes f5 to f8, reserved to the system.
19-26,zzzz9When the map size is 8, eight allocation block numbers from -32768 to 32767.
19-34,zz9When the map size is 16, sixteen allocation block numbers from 0 to 255.
Format zzzz9 indicates a signed decimal number of 1 to 5 digits with leading zeroes suppressed. Format zz9 indicates a number from 1 to 255 with leading zeroes suppressed. Format 0/1 indicates a one-digit decimal, 0 or 1.
The quotes and commas are omitted when the program is assembled with the BASIC option false.
Table 5-2. The contents of a directory entry record written by DFILE.

Using DFILE'S Output

There are many uses for DFILE's output. One is the creation and maintenance of a catalog of all the disk files owned by your installation. Another is to analyze the kinds of files kept on disk. What are the common file sizes? What is the ratio between the sizes of a program source file and its object file? The filerefs alone can be useful input to some applications. You could write a program that would scan for a particular character string in each file named in its input. The DFILE search-ref operand would determine the files to be scanned.

Tables 5-1 and 5-2 show the format of the records as prepared when the DFILE program is assembled with the label BASIC equated to TRUE. If that label is set FALSE and DFILE is reassembled, the output format changes. The quotes around the filename and filetype disappear, as does the comma between them. The commas preceding numbers are replaced with blanks. The resulting records are not suitable for BASIC. They are intended for reading in a Pascal program with the standard functions READ and READLN. The character fields appear first in the record because it is hard to switch from reading numbers to reading characters when using Pascal's READ procedures.

The Display Machine Revisited

The design of DFILE appears in Figure 5-12. Its source code appears in Listing 5-3. It uses a display machine similar in concept to the one in DLIST. This one, however, is more complicated. There are several different display "programs," and the correct one is passed to the display routine as a parameter. The display programs themselves are built with conditional assembly and macros to produce the two different output formats.

An important difference between DLIST and DFILE is that the latter has to produce numeric output in decimal. It is the first program in the book to do so. Decimal conversion and output is done via the routines in the PUTLIB include file (Appendix C).

Figure 5-12. The plan for DFILE, a program to capture directory data for use in other programs.

Program DFILE( output-ref, search-ref, /x ) save the output-ref operand move search-ref to the head of CpmFcb if search-ref was omitted, supply "????????.???" if search-ref includes a drivecode, make that the default drive if the /x option was given, set CpmFcb for exhaustive search gather information about the disk to be searched request the first matching directory entry while there is another entry save the entry in sequence by fileref and extent request the next matching entry end while. set up the output mechanism (in the style of a utility) format and write a record describing the searched disk for each saved entry format and write a record describing the entry end for. finish the output file end DFILE.

WRITING THE DIRECTORY

As you can see, it is not difficult to read the disk directory. Writing it is another matter. A program that modifies the directory is taking on a heavy responsibility. Reading the directory is harmless, but writing it is hazardous to data integrity. A mistake can damage the directory, causing files to be lost, or scrambling their data. And all methods of modifying the directory involve subverting the normal operations of the BDOS in some way. There is a fascination in writing a program that operates on the directory, despite these drawbacks. What follows is a discussion of the methods, so you may explore it on your own.

The Make, Close, Rename, and Erase services write directory entries; we've been using all of them in the included routines for utility file output. BDOS service request number 30, Set File Attributes, will alter the attribute bits in all extents of a file. The requests for sequential and direct-access writing also update directory entries. In all these cases, it is the BDOS that does the writing.

These are the only ways in which a program can modify the directory and still remain compatible with all the variants of CP/M. They work whether or not CP/NET is active; they work under MP/M 1, under CP/M 86, and under MP/M 2. They will presumably work under CP/M 3.

There are two other ways to write to the directory. One is to fool the Close request, and the other is to call on the BIOS directly. Both work in CP/M 2, but neither is allowed by MP/M 2.

The Close service request is really a request to update a directory entry with its completed data map. The directory entry to be written must have been created with a Make request. That request creates an entry with the given fileref and extent number, and a data map filled with zeros. Normally, the BDOS puts allocation block numbers in the FCB, and the Close request causes it to copy the altered data map to the directory entry.

Under CP/M, your program could figure out what allocation blocks it wanted to have as part of the file, and put those block numbers in the FCB's data map. Then a Close request would put them in the directory. In this way, a program can do its own allocation. A command to recover an erased file could be based on this. It would read the directory with the Search requests to get copies of the erased directory entries. Then the program would Make an entry for each extent of the erased program. Finally, it could Close its copy of each extent entry. The result would be to recreate the directory entries for the erased file. Such a program should first compare each allocation block number from the erased file against the allocation vector. If a block is presently in use, then that part of the erased file's space has been used for another file's data.

The MP/M 2 system prevents this tricky use of the Close request. It keeps a check-sum of each program FCB. It updates the check-sum each time the FCB is altered during a service request. When an FCB is presented as input to a service request, MP/M 2 tests its check-sum to see if the FCB has been altered since the BDOS last saw it; if so, it rejects the service request. This feature of MP/M 2 is intended to prevent the kind of disaster that could happen if a program bug caused an FCB to be corrupted in some way before it was closed. It also prevents the use of Close to record unofficial space allocation. The check-sum of the Closed FCB wouldn't match the one computed when the BDOS Made it.

Another way of writing the disk directory (or any other part of the disk) is to call directly on the BIOS. The BIOS functions are not difficult to use, although there are subleties to consider. For example, although the BIOS has a select-drive function, the program should call on the BDOS to select the drive to be used. Otherwise at the end of the program, the BDOS might be left thinking one drive was active, and the BIOS, another.

There are differences between the CP/M 1 BIOS functions and those of CP/M 2 and MP/M. In CP/M 1, a program had to handle sector skew in its own code; the BIOS operated in terms of physical sector numbers, not logical ones. In the later systems, the BIOS provides a skew-translation function. This function must be used. You simply can't predict the skew pattern used on every disk.

In CP/M 2 and MP/M, the BIOS write-a-sector function requires a signal as to the type of write that is to be done. This signal is not documented in the CP/M 2 manuals, but it is needed regardless. The signal is a byte in register C. If the program is writing to the directory, the byte should be 01h. That tells the BIOS that this write is not to be deferred; the data are to be sent to disk at once. Without the signal, the BIOS might hold the modified sector in a buffer; if the program crashed, the write would never happen. If the program is writing ordinary data, the signal should be 00h. That tells the BIOS that the physical sector containing this 128-byte record may have been used before, and so should be read from disk before the record is moved into it. In addition, it tells the BIOS that the write may be deferred, on the chance that the next record to be written might fall in the same sector.

Under MP/M 2, it is sometimes impossible for a command program to call on the BIOS for disk reads and writes. That part of the BIOS may be held in a different storage bank; it is then completely inaccessible from the program's storage.

There remains one more method: your program can drive the disk hardware directly. If you know your hardware very well, you can write code that causes the drive to seek, the head to load, and a sector to be read or written. Of course, the resulting program will work only in your own system or in one with identical hardware. There are sometimes advantages to this; you can write a disk-duplicator that runs as fast the hardware will let it, for example. But the loss of portability is a serious drawback.