Most active commenters
  • lisper(3)
  • haerwu(3)

←back to thread

Bought myself an Ampere Altra system

(marcin.juszkiewicz.com.pl)
204 points pabs3 | 23 comments | | HN request time: 1.397s | source | bottom
1. amelius ◴[] No.44421186[source]
> And the latest one, an Apple MacBook Pro, is nice and fast but has some limits — does not support 64k page size. Which I need for my work.

I wonder where this requirement comes from ...

replies(3): >>44421250 #>>44421494 #>>44421603 #
2. ot ◴[] No.44421250[source]
I would guess to develop and test software that will ultimately run on a system with 64k page size.
replies(1): >>44421261 #
3. amelius ◴[] No.44421261[source]
Is there a fundamental advantage over other page sizes, other than the convenience of 64k == 2^16?
replies(4): >>44421331 #>>44421363 #>>44421743 #>>44435389 #
4. ch_123 ◴[] No.44421331{3}[source]
64k is the largest page size that the ARM architecture supports. The large page size provides advantages for applications which allocate large amounts of memory.
5. raverbashing ◴[] No.44421363{3}[source]
Yes there are

(as a starting point 4k is a "page size for ants" in 2025 - 4MB might be too much however)

But the bigger the page the less TLB entries you need, and less entries in your OS data structures managing memory, etc

replies(1): >>44422104 #
6. zozbot234 ◴[] No.44421494[source]
Asahi Linux might support 64k pages on Apple Silicon hardware. Might require patching some of the software though, if it's built assuming a default page size.

It should also be possible to patch Linux itself to support different page sizes in different processes/address spaces, which it currently doesn't. It would be quite fiddly (which is why it hasn't happened so far) but it should be technically feasible.

IIRC ARM64 hardware also has some special support (compared to x86 and x86-64) for handling multiple-page "blocks" - that kind of bridges the gap between a smaller and larger page size, opening up further scenarios for better support on the OS side.

replies(1): >>44422594 #
7. dist1ll ◴[] No.44421603[source]
OP works for Red Hat, and some of the tests require booting systems with 64k pages.

What surprises me more is why Red Hat doesn't provide them with the proper hardware..

replies(3): >>44421698 #>>44422124 #>>44423469 #
8. rwmj ◴[] No.44421698[source]
Red Hat has dozens of internal aarch64 machines (similar to the one in the article) that can be reserved, but sometimes you just want a machine of your own to play with.
9. dan-robertson ◴[] No.44421743{3}[source]
The reason to want small pages is that the page is often the smallest unit that the operating system can work with, so bigger pages can be less efficient – you need more ram for the same number of memory mapped files, tricks like guard pages or mapping the same memory twice for a ring buffer have a bigger minimum size, etc.

The reason to want pages of exactly 4k is that software is often tuned for this and may even require this from not being programmed in a sufficiently hardware agnostic way (similar to why running lots of software on big median systems can be hard).

The reasons to want bigger pages are:

- there is more OS overhead tracking tiny pages

- as well as caches for memory, CPUs have caches for the mapping between virtual memory and physical memory, and this mapping is page-size granularity. These caches are very small (as they have to be extremely fast) so bigger pages means memory accesses are more likely to go to pages in the cache, which means faster memory accesses.

- CPU caches are addressed based on the index into the minimum page size so the max size of a cache is page-size * associativity. I think it can be harder to increase the latter than the former so bigger pages could allow for bigger caches, which can make some software perform better.

These things you see in practice are:

- x86 supports 2MB and 2GB pages, as well as 4KB pages. Linux can either directly give you pages in this larger size (a fixed number are allocated at startup by the OS) or there is a feature called ‘transparent hugepages’ where sufficiently aligned contiguous smaller pages can be merged. This mostly helps with the first two problems

- I think the Apple M-series chips have an 8k minimum page size, which might help with the third problem but I don’t really know about them

replies(1): >>44425440 #
10. fc417fc802 ◴[] No.44422104{4}[source]
4K seems appropriate for embedded applications. Meanwhile 4M seems like it would be plenty small for my desktop. Nearly every process is currently using more than that. Even the lightest is still coming in at a bit over 1M
replies(1): >>44425462 #
11. lisper ◴[] No.44422124[source]
> some of the tests require booting systems with 64k pages

OK, but then why an 80-core CPU?

replies(1): >>44422579 #
12. haerwu ◴[] No.44422579{3}[source]
@lisper because Q32 is more expensive than Q64 and I got offer for Q80 one.

Amount of options for sensible AArch64 hardware is too small (or too expensive).

replies(1): >>44422739 #
13. haerwu ◴[] No.44422594[source]
Apple Mx family of cpus supports only 4k and 16k page sizes. There is no way to run 64k binaries there without emulating whole cpu.
replies(1): >>44422864 #
14. lisper ◴[] No.44422739{4}[source]
OK, this is something I know very little about so you may have to explain it like I'm a complete noob, but I still don't understand why, if all you want to do is boot 64-bit Linux, you couldn't use, say, a Raspberry Pi 4 instead of spending thousands of zloties on an 80-core machine that requires industrial cooling.
replies(2): >>44422982 #>>44424385 #
15. zozbot234 ◴[] No.44422864{3}[source]
This is not the full story. As I mentioned in parent comment, ARM64 supports a special "contiguous" bit in its page table entries that, when set, allows a "block" of multiple contiguous pages to be cached in the TLB as a single entry. Thus, assuming 4k granularity, 64k "pages" could be implemented as contiguous "blocks" of 16 physical pages.
16. haerwu ◴[] No.44422982{5}[source]
Read https://marcin.juszkiewicz.com.pl/2025/06/20/the-hunt-for-a-... please.

Tool vs Toy.

replies(1): >>44423374 #
17. lisper ◴[] No.44423374{6}[source]
Thanks!
18. gbraad ◴[] No.44423469[source]
We have access to many, such as the TestFarm, or machines you can reserve on what is called Beaker.

Note: Recently also purchased an Ampere machine with some other people. Just to play around and host stuff.

19. jrockway ◴[] No.44424385{5}[source]
A lot of these ARM boards use custom (read: outdated) kernels and proprietary boot methods, so I'm not really sure how applicable they are to people developing Linux distributions that work everywhere. NixOS, for example, is only supporting UEFI booting on ARM64 going forward. If Redhat has the same policy, then there is only a limited set of arm64 boards available. I researched this recently as I'd like to move my k8s cluster from renting expensive cloud machines to running them on cheap machines at home, and the situation is ... difficult. (I have tested the Orange Pi 5 Max and the Radxa Rock 5B+. Both required me to hack edk2-rk3588, but they do work well now that most rk3588 support is merged in Linux 6.15/6.16-rc1. But, this is an old CPU and is just now getting mainline kernel support, and that is always how arm has felt. It is, however, kind of neat to see a "BIOS" on an ARM board. I hope it catches on.)
20. p_ing ◴[] No.44425440{4}[source]
I believe this is true for x86 as a whole, but on NT any large page must be mapped with a single protection applied to the entire page, so if the page contains read-only code and read-write data, the entire page must be marked read-write.
21. p_ing ◴[] No.44425462{5}[source]
1M is a huge waste of memory.

Imagine writing out a one sentence note in notepad and the resulting file being 1M on disk.

replies(1): >>44428367 #
22. fc417fc802 ◴[] No.44428367{6}[source]
Yet when I reference the running processes on my desktop something like 90% of them have more than 16M resident. So it doesn't appear that even an 8M page size would waste much on a modern desktop during typical usage.

If I'm mistaken about some low level detail I'd be interested to learn more.

23. nearyd ◴[] No.44435389{3}[source]
Yes! Data workloads fare considerably better with larger pages, less TLB pressire, and a higher cache hit rate. I wrote a tutorial about this and how to figure out whether it will be a good trade-off for your use-case: https://amperecomputing.com/tuning-guides/understanding-memo...