I'm having a kid-in-the-tunnel-meeting-Mean-Joe-Green-in-the-commercial moment, I just started my own game development journey about a week ago so it's neat getting to run across a full-on developer!
To stay on topic, I often thought how cool Abzu would have been with multiplayer but it's a good lesson to me that some features that might be desirable might also be a hindrance to some degree.
Okay, enough fanboying!
I wonder how that came to be used. It's a traditional way to distinguish eta and omega in transliteration from Greek, but it's not at all a traditional way to mark long vowels in general.
(I see that wikipedia says this about Akkadian:
> Long vowels are transliterated with a macron (ā, ē, ī, ū) or a circumflex (â, ê, î, û), the latter being used for long vowels arising from the contraction of vowels in hiatus.
But it seems odd for an independent root to contain a contracted double vowel. And the page "Abzu" has the circumflex on the Sumerian transliteration too.)
If it isn't a false cognate, I wonder what the function of "φ" and "ω" are..
Perhaps a circumflex was easier to typeset, like with logicians switching from Ā to ¬A and the Chomskyan school in linguistics switching from X-bar and X-double-bar to X' and XP?
So let's say you're going through all the actors and updating one thing. If those actors are in an array it's easy. Just a for loop, update the member variable, done. Easy, fast, should be performant right? But each time you're updating one of those the prefetcher is also bringing in extra lines, more data in the object, thinking you might need them next. So if you're only updating a single thing or a couple of things in the object on different cache lines you might really bring in 3-8x the data you actually need.
CPU prefetchers have something called stride detectors which can detect access patterns of N number of steps and stop the prefetcher from grabbing additional lines but at 16 cache lines the AActor object is way too big for the stride detector to keep up with. So you stride in gaps of 16 cache lines at a time through memory and you still get 2-3 extra cache lines after the initial access.
Secondly, a 1016 byte object just doesn't fit. It's word aligned but it's not cache line aligned and it's sure as hell not page aligned.
Best case scenario if you're updating two variables next to each other in memory the prefetcher gets both on the same cache line. Medium case scenario, the prefetcher has to grab the next line every so often. You'll get best most often and medium rarely.
Bad case scenario, the prefetcher has to grab the next cache line on the NEXT PAGE. Which only just became a thing on recent CPUs but also involves translating the virtual address of the next page to its physical page address which takes forever in data access terms. Bunch of pointer chasing, basically a few thousand clock of waiting.
The absolute worst case scenario is that the prefetcher thinks you need the next cache line, it's on the next page, it does the rigamarole of translating the next page's virtual address and you don't actually need it. You've done two orders of magnitude more work than reading a single variable for literally nothing.
So yeah. The prefetcher can do some weird ass shit when you throw weird and massive data structs at it. Slashing and burning the size down helps because the stride detector can start functioning again when the object is small enough. If it can be kept to a multiple of 64 bytes you even get page aligned again.
(Note also that "tough" is pronounced t-uh-f /tʌf/, with nothing O-like anywhere in it.)