Domain/OS Design Principles (1989) [pdf]

Why did single-level stores die off? It's an interesting question, and I'm not sure I know the answer. That's also how Multics worked, but I think what happened was it turned out that Unix was better.

It's not wholly coincidental, or intentional, that Unix didn't have mmap. The PDP-11 and the PDP-7 didn't have paging hardware, so early Unix couldn't implement mmap at all. And it was common to access files bigger than the virtual address space, and doing that with mmap requires you to sequentially map, then unmap, different parts of the file—basically what you have to do with read() and write(). So, early Unix couldn't implement mmap because it was designed to run on cheap hardware.

Also, though, if a program is reading from a file by memory-mapping it, you can't replace the file with a pipe unless you change the program. (If you lseek() on a pipe, it croaks with an ESPIPE, now called "Illegal seek".) Unix got enormously better composability and scriptability than other contemporary OSes by virtue of pipes, to the point where the Unix group ported their pipe-based toolkit of "software tools" to other operating systems in the mid-1970s in order to have a more comfortable working environment. Then, of course, the world started to revolve around TCP, which gives you a byte-stream between two machines, like a magtape, not a random-access collection of pages like a disk. (There are lots of networked applications that really prefer a remote-disk model; Acrobat Reader and Microsoft Access come to mind. But that wasn't where TCP/IP was in the 01980s and early 01990s, Sun's WebNFS aside.)

Another problem is that, when a program is mutating a shared mutable resource like a disk sector, there are times when the resource is in an inconsistent state. Usually, we think of this as a problem for concurrent access, and the solution is to keep any other thread from observing the inconsistent state, for example with a mutex. But it's also a problem for failure recovery: if your program crashes before restoring consistency, now you have data corruption to recover from.

In Unix, the mutable shared resource was usually the filesystem, so this was mostly only a problem if the kernel crashed, perhaps due to a power failure; ordinary user programs mostly created new files, so if they crashed during execution, the worst that could happen was that their output file would be incomplete. Then the user could delete it and try again. So, even though Unix wasn't a fault-tolerant OS like Tandem's Guardian, it did tend to limit the impact of faults. (The occasional exceptions to this rule, such as Berkeley mbox files, were a continuous source of new bugs.)

The easiest way to handle this kind of problem is with atomic transactions, so that if a program crashes halfway through an update, the old state remains the current state, and there is no data corruption problem to worry about. As I understand the situation, this is how IMS and DB2 have handled this problem since the 01960s and 01970s, respectively, and of course today we build lots of applications on top of transaction systems like Postgres, Kafka, ZODB, Git, MariaDB, and especially SQLite.

But none of those systems existed in the 01980s, except for IMS, DB2, and Postgres, and none of those ran on Domain/OS. I don't have any experience with Domain/OS but I imagine that this was a source of bugs for Domain/OS applications as well.

There's another, arguably distinct, fault-related problem that pops up in current use with mmap(). If you try to read() from a file, copying data into your address space, this may succeed or fail, or it may succeed partially, for example if you hit the end of the file. All of these conditions arise at the readily identifiable point in your program where it invokes read(), and so you can look at the code to see if you forgot to handle one of them at that point. Moreover, you can be sure that neither of those two problems will arise later while you're using the data you've read, possibly while you have some other shared mutable resource in a temporarily inconsistent state.

By contrast, with mmap(), such a failure can arise any time you access the mapped memory, in most cases. For example, someone else may have truncated the file since you mapped it, as in http://canonical.org/~kragen/sw/dev3/mmapcrash.c, where as soon as the array index strays onto the now-nonexistent page, the program dies with a bus error. This makes it more difficult to write programs that handle failures correctly.

Relatedly, there's a performance issue: although memory-mapping a page and then reading it means the kernel doesn't have to copy its contents into your address space, which often increases performance, it does still have to read the page from disk. But it has much less information about your access patterns than when you're using read() and lseek(). This sometimes reduces performance, because prefetching pages before userland requests them makes a big performance difference—in the 01980s, we're talking about 30000 microseconds to wait for the disk, versus 1 microsecond to handle a page fault or 2 microseconds to handle a small read(), if the data is prefetched. It doesn't take a whole lot of extra prefetch failures to make mmap() slower, potentially by orders of magnitude.

With modern NVDIMMs and NVMe Flash, and especially new memory architectures like 3D XPoint, the performance advantages of memory-mapping might become much more important again. If it takes 300 ns to call and return from read() or write(), plus 700 ns to copy 4096 bytes into or out of userspace, then spending 100 ns to read a random cache line from 3D XPoint memory (is that about how long it takes?) might be greatly preferable to spending 1000 ns to read a page of data from it through the syscall interface. But this was not a possibility in the Apollo years.

One final minor issue with the Multics segment-mapping approach, at least when realized with paging hardware instead of segmentation hardware, is slack space at the ends of files. If the fundamental fixed-size units of a file consists of are not a multiple of the page size, such as a byte of text, then there will be times when the file's natural size is not a whole number of pages. So, for example, in CP/M files consist of 128-byte "sectors", thus saving 7 precious bits per directory entry. Your application program needs some kind of application-specific logic to tell whether the last page of the file has unused space in it, and, if so, how much.

So in CP/M, for example, some applications would place a ^Z after the last legitimate byte of a text file, and others would fill the rest of the sector with up to 127 ^Z characters. As you can imagine, this kind of thing is fertile ground not only for application bugs (you can't reliably store ^Z in a text file, and never as the last byte) but also subtle application incompatibilities. If you want to write a Unix "cat" program for CP/M, it needs to have an opinion about which of these conventions to use, and also what to do if it finds a ^Z that isn't in the last sector.

Again, I never used Domain/OS, so I don't know how it handled text files or other files that commonly had a non-page-aligned EOF. The Apollo engineers were brilliant and produced a stunning system that was much better than Unix in many ways. So maybe they had a good solution to this problem, like a universally-used text-file-handling library that didn't use a brain-dead encoding like the CP/M one. I'm just saying it's a problem that crops up in userspace with the single-level store approach (on paging hardware), while the Unix approach relegates it to the filesystem driver.