Early UNIX file system formats

Here is a summary of the on-disk file system formats used in early versions of UNIX. Formats are presented in chronological order, so far as I can determine that. The focus is on how the bits were laid out on the disk, rather than on the system calls used to access them, though some notes on the latter are inevitable.

Some principles applying to all the formats described here (and to many of their successors):

PDP-7 source code printout, unknown date

The information in this section was puzzled out of an old, nearly uncommented source code listing. The date of the printout is unknown; it is new enough to have fork.

The PDP-7 is a word-addressed machine, so all data structures are composed of words rather than bytes. A word is 18 bits. Character strings conventionally reside in successive 9-bit half-words (I don't know in which order). Disk blocks are 64 words.

Block 0 contains basic information, called sysdata within operating system; it is not known whether the term `super-block' was used yet. The i-list begins in block 2.

The sysdata area contains:

  1. The address of the next block containing a chunk of the free list (one word).
  2. A counter (one word) and an array (ten words) of listing up to ten blocks known to be free.
  3. The file system's current unique number (one word).
  4. Current system time in sixtieths of a second since an unrecorded base date (two words; most significant word first).
The disk copy of sysdata is updated regularly, perhaps at every process switch.

The array of free blocks works as follows:

Notice that this algorithm has a bug: once a block becomes part of the chain containing chunks of the free list, it cannot be allocated to a file even after that part of the list has been used up. Perhaps the intent was for the allocator to return the address of the block whence the chunk was just read instead of starting over.

I-nodes are 12 words long:

  1. Flags: allocated, large file, special file, directory, read and write permissions for owner and others.
  2. Seven block addresses (each a single word).
  3. Owner's user ID.
  4. Link count.
  5. File size, in words.
  6. Unique number for this i-node.

The first i-node is numbered 0, but is never used. I-node 3 is the root directory. Other i-nodes numbered 1-16 are reserved for special files; the i-number selects a device driver. The remaining i-numbers have no special meaning.

There are two schemes for locating data blocks, depending on the `large file' flag:

A directory entry is eight words long, but only six words are defined: an i-number, four words (eight characters) of filename, and the unique number for that i-node. The filename is filled to the right with space characters (octal 040).

When an i-node is allocated, the unique number in sysdata is incremented and copied into the new i-node. Hence when an i-node is re-used for a different file, it is likely to have a different unique number. When a directory entry is made, the unique number is copied from the i-node. The system thus could check for directory entries pointing to the `wrong' i-node. (This version didn't.)

Limits and notes:

First Edition manual, file system and directory(V), 3 November 1971

The file system is much changed from the PDP-7 verson.

Blocks 0 and 1 are the super-block, containing bitmaps listing the free blocks and free i-nodes in the file system. The i-list begins at block 2; blocks after the i-list are available for file data.

The contents of the super-block, in more detail:

  1. Number of bytes in the free-storage map (one 16-bit word). The count is always even.
  2. The free-storage map. Each bit represents a block; if the bit is set, the block is free. Blocks used by the super-block and the i-list are included: i.e. the first bit is block 0.
  3. Number of bytes in the free i-node map (one 16-bit word); always even.
  4. The free-i-node map. Each bit represents an i-node; if the bit is set, the i-node is allocated (unlike the free-storage map). The first bit is i-node 41; the first 40 i-nodes are always allocated.
  5. On the root file system only, several words of status information follow the free-i-node map. Times are in sixtieths of a second.

I-nodes are 32 bytes long, and contain:

  1. Flags (16 bits): i-node allocated; directory; large file; set user ID on execution; a single `executable' flag; read and write permissions for owner and others.
  2. Link count (8 bits).
  3. Owner's user ID (8 bits).
  4. File size in bytes (16 bits).
  5. Eight 16-bit block addresses.
  6. A file creation time and a last-modified time (32 bits apiece).

I-nodes 1-40 are special files; the i-number selects a device. I-node 41 is the root directory of the file system. Other i-numbers have no special meaning.

If an i-node is marked free in the free map but its `allocated' flag is set, it is taken to be allocated, so a corrupt map will not re-use i-nodes.

Block addressing (small and large files) works as on the PDP-7, except that there are eight addresses in the i-node now, and an indirect block now contains 256 16-bit block addresses. The address for file block n is found in entry n mod 256 in the indirect block named by address n/256 in the i-node array.

A directory entries is 20 bytes long: a 16-bit i-number followed by an eight-byte filename, filled to the right with NUL bytes. A zero i-number means the entry is unused; the filename bytes of an empty entry may contain garbage. By convention, the first two entries in each directory are for . (with this directory's i-number) and .. (its parent's i-number, or its own if this is the root directory), but this convention is not enforced by the operating system.

Limits and notes:

Third Edition manual, file system and directory(V), 15 March 1972

The file system format has not changed, but times are now sixtieths of a second since 00:00 1 January 1972, extending the life of the time format until April 1974.

Fourth Edition manual: file system and directory(V), 7 and 10 September 1973

The file system structure has changed quite a bit, probably at the same time the operating system was rewritten in C. The new format is the almost that of the more familiar Sixth Edition system.

Block 0 is reserved for a bootstrap program. Block 1 is the super-block. The i-list begins in block 2; data blocks follow.

The super-block is structured as follows:

  1. 16-bit count of blocks in the i-list.
  2. 16-bit count of blocks in the entire file system; said to be used by check(I) to validate block numbers, but not by the operating system.
  3. Free list header: a count and a 100-element array of block numbers.
  4. Free i-node cache: a count and a 100-element array of i-node numbers.
  5. The 32-bit time when the file system was last modified.

The free list is reminiscent of that in the PDP-7 system, but without the bug:

The free-inode cache is similar, but is just a cache:

I-nodes are 32 bytes long, and contain:

  1. Flags (16 bits): i-node allocated; 2-bit file type (plain file, directory, character special, block special); large file; set user ID on execution; set group ID on execution; read, write, and execute permissions for owner, group, and others.
  2. Link count (8 bits).
  3. Owner's user ID (8 bits).
  4. Owner's group ID (8 bits).
  5. File size in bytes (24 bits; high eight bits stored in a separate byte).
  6. Eight 16-bit block addresses.
  7. A last-access time and a last-modified time (32 bits apiece).

I-node 1 is the root directory of the file system; other i-numbers have no special meaning. Special files are now distinguished by the file type in the flags, rather than by i-number.

The block addresses are used in the same way as in the previous format, with direct addresses for small files and indirect blocks for large ones.

Special files have no data blocks; instead, the first `block address' is a number naming a device. The high byte is an index into the system's table of devices; the low byte indentifies a particular device unit, and is interpreted in different ways by different drivers.

A directory entry is now 16 bytes long: 16 bits of i-number, 14 bytes of filename, the latter filled to the right with NUL bytes. The . and .. entries are unchanged.

Limits and notes:

Sixth Edition manual, file system and directory(V), 9 February 1975

Evidently files larger than one megabyte are needed now, so the block-addressing scheme has changed. For large files, the eighth address in the i-node now points to a doubly-indirect block, containing the addresses of 256 ordinary indirect blocks, each containing 256 data block addresses. Hence the addressing scheme can now represent files with (256 * 7) + (256 * 256) = 67328 blocks, or just under 33 megabytes; since the file size is still stored in 24 bits, the biggest possible file is now 16 megabytes.

Seventh Edition manual, filsys and dir(5), January 1979

The file system code has been rewritten again. Types and constants are more carefully parameterized, as part of the general cleanup of the system that happened at this time.

As before, block 0 is reserved for a bootstrap program; block 1 is the super-block; the i-list begins at block 2; data blocks follow.

The super-block is structured as follows:

  1. Number of first block not in i-list (16 bits).
  2. Number of blocks in entire file system (32 bits).
  3. Free-block list header: 16-bit count, array of NICFREE, 32-bit block numbers.
  4. Free-i-node cache: 16-bit count, array of NICINOD 16-bit i-numbers.
  5. Four bytes of flags meaningful only inside the running system.
  6. Time when super-block last updated (32 bits).
  7. Several numbers that are `not maintained by this version of the system:'

The free-block list and free-i-node cache work as they have since the Fourth Edition format.

I-nodes have grown to 64 bytes:

  1. The former `flags' are now called `type and mode': file type (regular, directory, block or character special, block or character multiplexed special); set user ID, set group ID on execution; save swapped text after use; read, write, and execute permissions for owner, group, and others.
  2. Number of links (now 16 bits).
  3. Owner's user ID (now 16 bits).
  4. Owner's group ID (now 16 bits).
  5. File size in bytes (now 32 bits).
  6. 40 bytes of block addresses: 13 addresses of 3 bytes each (hence 24 bytes) followed by one unused byte.
  7. Times when the file was last accessed and last modified, and when either the file or the i-node was last changed; 32 bits each. The `last-changed' time is used to decide whether a file should be saved on an incremental backup. Comments in the data structure shown in the manual (and hence probably in the header file sys/ino.h) call it `time created,' which will spawn years of confusion.

I-node 2 is now the root directory of the file system; other i-numbers have no special meaning.

Instead of `small' and `large' files, different addresses are used in different ways:

Directory entries are the same as the Fourth Edition format.

Limits and notes: