←back to thread

95 points ingve | 7 comments | | HN request time: 0.422s | source | bottom
1. nephanth ◴[] No.44568300[source]
From my noobish standpoint, it feels like most code shounldn't care what the page size is? Why does it need te be recompiled?

What typically tends to break when changing it?

replies(6): >>44568475 #>>44568482 #>>44568492 #>>44568560 #>>44568984 #>>44569166 #
2. kevingadd ◴[] No.44568475[source]
Off the top of my head:

If you rely on being able to do things like mark a range of memory as read-only or executable, you now have to care about page sizes. If your code is still assuming 4KB pages you may try to change the protection of a subset of a page and it will either fail to do what you want or change way too much. In both cases weird failures will result.

It also can have performance consequences. For example, if before you were making a lot of 3.5KB allocations using mmap, the wastage involved in allocating a 4KB page for each one might not have been too bad. But now those 3.5KB allocations will eat a whole 16KB page, making your app waste a lot of memory. Ideally most applications aren't using mmap directly for this sort of thing though. I could imagine it making life harder for the authors of JIT compilers.

Some algorithms also take advantage of the page size to do addressing tricks. For example, if you know your page size is 4KB, the addresses '3' and '4091' both can be known to have the same protection flags applied (R/W/X) and be the same kind of memory (mmap'd file on disk, shared memory segment, mapped memory from a GPU, etc.) This would allow any tables tracking information like that to only have 4KB granularity and make the tables much smaller. So that sort of trick needs to know the page size too.

3. leidenfrost ◴[] No.44568482[source]
Typically low level code and some manual fiddling with memory by asuming page size.

Everything's ok until some obscure library suddenly segfaults without any error

4. okanat ◴[] No.44568492[source]
Because the final ELF binary is linked to contain page aligned segments. Segments define how should the binary be loaded into memory and what permissions they require.

If you have a 4KB segment that is marked Read-Write followed immediately by a Read-Execute, naively loading it will open a can of security issues.

Moreover many platform data structures like Global Object Table of the dynamic executable uses addresses. You cannot simply bump things around.

On top of that libraries like C++ standard library (or abseil from Google) rely on the page size to optimize data structures like hash maps (i.e. unordered_map).

5. magnat ◴[] No.44568560[source]
Mostly for I/O, e.g. mmap requires file offset to be multiple of the page size.
6. junon ◴[] No.44568984[source]
Performance, safety and IO critical code must care, because the page size affects TLB caching and is the finest granularity for security flags such as read-only, no execute, etc. which are critical for e.g. guard pages.

If your code that created two guard pages sandwiching a security critical page to make sure that under/overruns caused a page fault and crashed that assumed the boundary was at 4KiB, but is really now at 16KiB, that means that buffer overruns now will not get caught.

Further, code that assumed it was on a page boundary for some reason, for performance reasons, will now have only a 25% chance of being so.

It also means that MMIO physical pages that were expected to be contained within a 4KiB page such that when mapped into a sensitive user space driver context, neighboring MMIO control blocks wouldn't be touched, might be affected too since you'll get up to 3 neighboring blocks in either direction. This probably doesn't happen so often, I don't know Android internals much, but still something to consider.

This is in large part because PAGE_SIZE in a lot of C code is a macro or constant, rather than something populated at runtime depending on the system the code is running - something I've always felt is a bit problematic.

That being said, code that's hard coding PAGE_SIZE won't run anyway if using e.g. mmap() because it validates the page size and will error on mismatch.

This is going to wreak general havoc for a while no matter how you spin it.

7. notepad0x90 ◴[] No.44569166[source]
most code shouldn't but you don't know what the library you're using is doing behind the scenes. the few code that do care, if a lot of people use them as a dependency, that could get real messy real fast.