So, uh. I did this! Crudely. In 2010.
http://cowlark.com/lbw/
It's a Linux syscall translator for Windows. It works well enough to run a Debian userland, although it's got so many holes and rough edges that I would never, ever, ever suggest using it for anything other than a stunt.
It uses Interix to do most of the heavy lifting, so all LBW does is to translate from Linux syscalls to Interix syscalls; so we get a Unix filesystem and user permissions and sockets and fork etc for free. (Interix was great. I'm glad they're bringing it back from the dead.) Unfortunately not all the system calls directly map onto each other; so Interix has a native fork(), but Linux emulates with clone(). I couldn't make threads work.
A few of the biggest problems were:
- the Windows page size is 64kB; the Linux page size is 4kB. The ld.so loader will try to map two bits of executable within the same 64kB boundary, and, of course, this doesn't work on Windows. I crudely hack around it by allocating pages of RAM and copying things. Write-back mapping only works at all if the application lets mmap() pick the address.
- very very very different register usage. glibc on Linux uses gs as a 'pointer' to the current thread's private data area, via a special syscall. Windows resets gs to 0 on every interrupt! I crudely hack around this by intercepting null pointer dereferences, looking at the instruction to see if it was gs, and then reloading it with the right value.
- even then, that syscall sets gs to point at a GTD segment with a size of 2^32; this wraps round the entire address space, which allows very large offsets in gs to be treated as negative numbers. Windows doesn't let you create GTD segments. It only allows LTD segments, and it caps the segment limit to the end of the user address space, so this trick won't work. I crudely hack around this by intercepting segmentation violations, looking at the instruction to see it it's a [gs+negative number] dereference, and then binary patching the executable to use a different instruction.
- glibc is horrible and undocumented. There's a big pile of key-value strings pushed onto the stack above the environment when the process is initialised, containing various magic numbers. ld.so will just crash if you get this wrong. I spent a lot of time reverse engineering the ld.so source code to figure out what these were and how to set them up.
It was all vile and horrible, but it worked surprisingly well (i.e., it worked, which was surprising).
Using the NT kernel's personality system to implement Linux syscalls natively is totally the right thing to do; that's obviously what they're doing here.
I would love to know about the internal Microsoft politics which made releasing this possible. I wonder how long it's been brewing? I did LBW in about a month of evenings; the core logic wasn't hard. I wouldn't be at all surprised if this hasn't been floating about inside Microsoft for years.