Deleting multiplayer from the Unreal engine can save memory

(larstofus.com)

1. bengarney ◴[08 Apr 25 02:47 UTC] No.43617928[source]▶

Really interesting analysis of where the data lives… cutting 3-4 textures would save you more memory even in the 100k actor case, though.

replies(2): >>43618098 #>>43618115 #

2. reitzensteinm ◴[08 Apr 25 03:26 UTC] No.43618098[source]▶

>>43617928 #

Depending on a bunch of factors of how this data is accessed and actors are laid out in memory, it may be more cache friendly which could yield substantial speedups.

Or it could do next to nothing, as the data is multiple cache lines long anyway.

replies(1): >>43622853 #

3. spockz ◴[08 Apr 25 03:27 UTC] No.43618106[source]▶

>>43600363 (OP) #

Besides the memory savings, would there also be performance gains? More actors would now fit in a cache line.

replies(2): >>43618116 #>>43619116 #

4. cma ◴[08 Apr 25 03:28 UTC] No.43618115[source]▶

>>43617928 #

If the memory savings he got were fully read or fragmented with other stuff on cache lines that are read in every frame (not likely for static world actors), it could be ~10% of CPU memory bandwidth on mobile every frame at 120hz on an lpddr4 phone.

A big problem with them is they are so heavyweight you can only spawn a few per frame before causing hitches and have to have pools or instancing to manage things like bullets.

I think in their Robo Recall talk they found they could only spawn 10-20 projectile style bullets per frame before running into hitches, and switched to pools and recycling them.

replies(2): >>43619650 #>>43623175 #

5. cma ◴[08 Apr 25 03:29 UTC] No.43618116[source]▶

>>43618106 #

The actors in unreal are such bloated single inheritance god classes that with a few actor components they take up more like a 4K memory page not part of a cache line, especially in editor builds.

But they do have a more optimized entity component system now too.

To be fair, a single transform now that things are 64 bit coordinates I think is bigger than a cache line too.

6. out-of-ideas ◴[08 Apr 25 03:38 UTC] No.43618152[source]▶

>>43600363 (OP) #

is it me or does the side loop endlessly loading thousands of requests? to at least:

- https://public-api.wordpress.com/wp-admin/rest-proxy/#https:...

- https://s0.wp.com/wp-content/js/rlt-proxy.js?m=20240709

blocking these via regex made the page load up really nice and fast

edit: formatting

7. keyle ◴[08 Apr 25 04:03 UTC] No.43618247[source]▶

>>43600363 (OP) #

Would this have any effect in prod build though?

replies(1): >>43619713 #

8. mwkaufma ◴[08 Apr 25 04:04 UTC] No.43618250[source]▶

>>43600363 (OP) #

On ABZÛ we ripped out multiplayer and lots of other cruft from AActor and UPrimitiveComponent - dropping builtin overlap events, which are a kind of anti pattern anyway, not only saved RAM but cut a lot of ghost reads/writes.

replies(6): >>43618358 #>>43618532 #>>43619035 #>>43619088 #>>43621863 #>>43624533 #

9. joegibbs ◴[08 Apr 25 04:10 UTC] No.43618279[source]▶

>>43600363 (OP) #

You would probably want to avoid having tens of thousands or a hundred thousand actors though, they're pretty heavy regardless. There might be a few reasons why you'd want to have that many but I think ideally you'd want to instance them or have some kind of actor that handles UObjects instead

10. rc5150 ◴[08 Apr 25 04:29 UTC] No.43618358[source]▶

>>43618250 #

Whoa! Didn't picture myself seeing a dev who worked on Abzu in the wild here on HN--I very much enjoyed that game, my thanks and high fives to your team for your work!

I'm having a kid-in-the-tunnel-meeting-Mean-Joe-Green-in-the-commercial moment, I just started my own game development journey about a week ago so it's neat getting to run across a full-on developer!

To stay on topic, I often thought how cool Abzu would have been with multiplayer but it's a good lesson to me that some features that might be desirable might also be a hindrance to some degree.

Okay, enough fanboying!

11. jayd16 ◴[08 Apr 25 05:14 UTC] No.43618532[source]▶

>>43618250 #

What makes overlap events an anti-pattern?

replies(1): >>43618589 #

12. mwkaufma ◴[08 Apr 25 05:28 UTC] No.43618589{3}[source]▶

>>43618532 #

In principle it's a fine idea, but their implementation has so many footguns (race conditions, thundering herds, perf cliffs, etc) it was easier to impl your own simpler alternative.

13. thaumasiotes ◴[08 Apr 25 06:55 UTC] No.43619035[source]▶

>>43618250 #

Your presence inspired me to try to look up what the circumflex on "abzû" is supposed to signify. As best I can tell, it's a marker of vowel length.

I wonder how that came to be used. It's a traditional way to distinguish eta and omega in transliteration from Greek, but it's not at all a traditional way to mark long vowels in general.

(I see that wikipedia says this about Akkadian:

> Long vowels are transliterated with a macron (ā, ē, ī, ū) or a circumflex (â, ê, î, û), the latter being used for long vowels arising from the contraction of vowels in hiatus.

But it seems odd for an independent root to contain a contracted double vowel. And the page "Abzu" has the circumflex on the Sumerian transliteration too.)

replies(2): >>43619371 #>>43625337 #

14. larstofus ◴[08 Apr 25 07:05 UTC] No.43619088[source]▶

>>43618250 #

Oh, nice to see that there are real-life examples of this stuff, thank you very much :) Needless to say that I'll take a deeper look at the overlaps now :D

15. larstofus ◴[08 Apr 25 07:11 UTC] No.43619116[source]▶

>>43618106 #

I haven't profiled this specifically, but my guess is that there shouldn't be any measurable performance gains. Most of the time, actors are randomly scattered in memory anyway, so having smaller actors doesn't avoid a lot of cache misses.

16. stavros ◴[08 Apr 25 07:56 UTC] No.43619371{3}[source]▶

>>43619035 #

"Abzu" is also the Greek onomatopoeia for a sneeze.

replies(2): >>43619474 #>>43622886 #

17. thaumasiotes ◴[08 Apr 25 08:21 UTC] No.43619474{4}[source]▶

>>43619371 #

I was highly amused to learn that the ancient Greek verb for spitting is πτύω. (Compare English "ptooey".)

replies(2): >>43619484 #>>43619522 #

18. stavros ◴[08 Apr 25 08:23 UTC] No.43619484{5}[source]▶

>>43619474 #

Yeah, and the modern is much the same ("φτύνω"). I'm sure it's onomatopoeic, and it's an amusing word.

replies(1): >>43623418 #

19. pandemic_region ◴[08 Apr 25 08:32 UTC] No.43619522{5}[source]▶

>>43619474 #

Does the popular 'hawk' prefix for this also originate from ancient Greek ?

20. teamonkey ◴[08 Apr 25 08:56 UTC] No.43619650{3}[source]▶

>>43618115 #

Pooling is pretty standard practice though, it would be the go-to solution for any experienced gameplay programmer when dealing with more than a dozen entities (though annoyingly there isn’t a standardised way of doing it in Blueprint).

replies(2): >>43619995 #>>43625933 #

21. teamonkey ◴[08 Apr 25 09:12 UTC] No.43619713[source]▶

>>43618247 #

That was my immediate thought on reading the article. Is this in dev or prod builds and which UE version is it referring to?

(Not that I expect the UActor code to have changed much but modifying UActor seemed more common in the early 4.x days.)

replies(1): >>43619758 #

22. larstofus ◴[08 Apr 25 09:22 UTC] No.43619758{3}[source]▶

>>43619713 #

Oh, good point. The test was done in the current stable 5.5 release in a development configuration :) Since this is a change to the memory layout of the class, there should be no change due to the build configuration, though.

23. dijit ◴[08 Apr 25 10:08 UTC] No.43619995{4}[source]▶

>>43619650 #

To be completely fair though, blueprints themselves are oft-maligned for performance.

They're fantastic for prototyping, but once you have designed some kind of hot-path most people typically start converting blueprints to code as an optimisation.

In such a scenario adding pooling becomes a trivial part of such an effort.

24. Fokamul ◴[08 Apr 25 11:09 UTC] No.43620333[source]▶

>>43600363 (OP) #

Too bad big companies don't care about this and more. "Morons(gamers) will just buy new hardware, fu hiring engine core devs".

replies(4): >>43620616 #>>43620688 #>>43623759 #>>43625095 #

25. maccard ◴[08 Apr 25 12:02 UTC] No.43620688[source]▶

>>43620333 #

This attitude comes up on here whenever gamedev comes up, and I really dislike it.

Here's a quote form the article

> I’ve already told you that this method saves 328 bytes per actor, which is not too much at first glance. You can apply the same trick for SceneComponents and save another 32 bytes per SceneComponent. Assuming an average of two SceneComponents per actor, you get up to 392 bytes per actor. Still not an impressive number unless you deal with a lot of actors. A hypothetical example level with 25 000 actors (which is a lot, but not unreasonable) will save about 10 MB.

I've a lot of experience with Unreal, and 25k actors is likely to run into a whole host of problems, such that saving 10MB of RAM is likely to be the least of your worries. You'd get more benefit out of removing a single unneeded texture, or compressing a single animation better.

One of the reasons developers use unreal (and yes, developers do use Unreal, it's not just "big companies" forcing their poor creatives to use the engine) is _because_ unreal has more man hours of development in a year than a small team would ever be able to put into their own engine. Like any tool it has tradeoffs, and it does have a (measureable) overhead. But to say that companies don't care is just disingenuous

replies(1): >>43621555 #

26. jofla_net ◴[08 Apr 25 13:14 UTC] No.43621436[source]▶

>>43600363 (OP) #

This was true even in the first version over 20 years ago. In a single player derivative, I remember combing through tons of UScript, unrealscript, stanzas which went something like. "Do this, and if we're in multiplayer, do this too or instead." The code was a messs, but again, good times.

27. speed_spread ◴[08 Apr 25 13:27 UTC] No.43621555{3}[source]▶

>>43620688 #

Actors are handled by the CPU where shaving 10MB can mean that more things can now fit in the cache leading to dramatic improvement.

replies(1): >>43622238 #

28. pwdisswordfishz ◴[08 Apr 25 13:58 UTC] No.43621863[source]▶

>>43618250 #

What's a ghost read? Search engines are failing me.

replies(4): >>43621972 #>>43621981 #>>43622956 #>>43625739 #

29. dagmx ◴[08 Apr 25 14:08 UTC] No.43621972{3}[source]▶

>>43621863 #

Unnecessary reads that you can’t really control or observe well.

30. ◴[08 Apr 25 14:09 UTC] No.43621981{3}[source]▶

>>43621863 #

31. maccard ◴[08 Apr 25 14:31 UTC] No.43622238{4}[source]▶

>>43621555 #

If you’re going to make that assertion then back it up with numbers. It could just easily have absolutely no impact whatsoever because your game thread is spending all its time on navigation mesh queries which have nothing to do with actors or UObjects.

replies(1): >>43626653 #

32. bengarney ◴[08 Apr 25 15:20 UTC] No.43622853{3}[source]▶

>>43618098 #

I would not expect much, but you'd have to measure to be sure.

If you actually have a million of something you're better off writing a custom manager thing to handle the bulk of the work anyway. For instance, if you're doing a brick building game where users might place a million bricks - maybe you want each brick to be an Actor for certain use cases, but you'd want to centralize all the collision, rendering, update logic. (This is what I did on a project with this exact use case and it worked nicely.)

replies(1): >>43627935 #

33. ◴[08 Apr 25 15:24 UTC] No.43622886{4}[source]▶

>>43619371 #

34. mwkaufma ◴[08 Apr 25 15:30 UTC] No.43622956{3}[source]▶

>>43621863 #

Unused memory accesses thrashing cache

35. Pxtl ◴[08 Apr 25 15:50 UTC] No.43623175{3}[source]▶

>>43618115 #

I've never played with UE and so I'm kinda shocked to learn that there isn't pooling already for objects that have this kind of creation cost.

36. jorvi ◴[08 Apr 25 16:15 UTC] No.43623418{6}[source]▶

>>43619484 #

It's funny how I can sort-of read the Dutch word in "τύν": spit "tuf" and spitting "tuffen". I can't find the etymology of it so it might be a false cognate

If it isn't a false cognate, I wonder what the function of "φ" and "ω" are..

replies(1): >>43623721 #

37. stavros ◴[08 Apr 25 16:40 UTC] No.43623721{7}[source]▶

>>43623418 #

It is a false cognafe, it's not pronounced "oo", but "ee". It's just onomatopoeia, that's why it's so similar.

replies(2): >>43626724 #>>43627279 #

38. jayd16 ◴[08 Apr 25 16:43 UTC] No.43623759[source]▶

>>43620333 #

Yeah, why does an engine include something useless like *checks notes... "multiplayer?"

39. dleslie ◴[08 Apr 25 17:10 UTC] No.43624054[source]▶

>>43600363 (OP) #

This has been a pattern since the first release of Unreal Engine. It's how we managed to smoosh it onto PS2 and Xbox.

40. joshyeager ◴[08 Apr 25 17:49 UTC] No.43624533[source]▶

>>43618250 #

Thank you for ABZÛ! My daughter has played it at least ten times. And when she wrote a letter to your team for a school project, you sent back a t-shirt and a soundtrack CD. We've listened to that CD for hours on road trips, it is a great soundtrack.

41. thenthenthen ◴[08 Apr 25 18:06 UTC] No.43624732[source]▶

>>43600363 (OP) #

How to implement A-life in Stalker 2

42. shadowgovt ◴[08 Apr 25 18:24 UTC] No.43624912[source]▶

>>43600363 (OP) #

This is a really good writeup. Something the author doesn't mention is that shrinking your data structures is also helpful for cache cohesion: if your structures are smaller, more of them can fit in smaller CPU caches (even if the game engine is striping resources of the same kind to simplify ripping across them every frame, this can matter).

The only counterweight I'd add is that if you later decide to add multiplayer, that is very, very hard if the engine wasn't set up for it from the beginning. Multiplayer adds complexity that exceeds simple things like getting on the network and sending messages; synchronization and prediction are meaningful for a realtime experience and are much easier to get to if you've started from "It's multiplayer under the hood but the server is local" than "We have a single-player realtime game and we're making it multiplayer." But that's not a reason never to do this; not all games need to be multiplayer!

43. shadowgovt ◴[08 Apr 25 18:36 UTC] No.43625095[source]▶

>>43620333 #

The market is a lot more complicated than that. But to a first approximation, this is an uncharitable statement of the reality: gaming is a luxury product, gaming is winner-take-all (i.e. the really successful games see 100,000x gross revenue over the median indie game, and people can't play two games at once so player attention is a very constrained resource), and the market consistently rewards novelty over polish. Players still bought Cyberpunk 2077, and then kept buying it after the bug announcements came out; it has sold 30 million copies.

All these market forces conspire to heavily incentivize a game studio to release as close to now as possible with as much game as they believe the players will stomach as possible. There are companies that buck this trend (Nintendo has a tradition of maximizing quality out-of-the-box), but that's where incentives point companies. Minecraft was hilariously buggy (and devoid of features) when it came out; its original developer committed it to a price model where the earlier you bought it, the cheaper it would be, and it became one of the most popular mega-games of a generation.

And the incentives come from players. Helldivers 2 doesn't have bugs because Arrowhead is lazy; it has bugs because Arrowhead wants a billion dollars and gamers can be trusted to hand them over for a product that works most of the time, as long as it's more fun than frustrating.

44. mananaysiempre ◴[08 Apr 25 18:59 UTC] No.43625337{3}[source]▶

>>43619035 #

Seems to be an older convention in linguistics. Romanizations of Japanese also switched from circumflexes (Tôkyô) to macrons (Tōkyō) at some point in time fairly long ago—I think the English-language Japanese journal I saw using that convention systematically was from the late 1950s, and its recent issues definitely don’t use it.

Perhaps a circumflex was easier to typeset, like with logicians switching from Ā to ¬A and the Chomskyan school in linguistics switching from X-bar and X-double-bar to X' and XP?

45. Veliladon ◴[08 Apr 25 19:46 UTC] No.43625739{3}[source]▶

>>43621863 #

There's two main things. First of all, when you load data into a register from an address in memory the CPU loads 64-byte cache lines, not words or bytes. AActor for instance is 1016 bytes. 16 cache lines. It's freaking huge.

So let's say you're going through all the actors and updating one thing. If those actors are in an array it's easy. Just a for loop, update the member variable, done. Easy, fast, should be performant right? But each time you're updating one of those the prefetcher is also bringing in extra lines, more data in the object, thinking you might need them next. So if you're only updating a single thing or a couple of things in the object on different cache lines you might really bring in 3-8x the data you actually need.

CPU prefetchers have something called stride detectors which can detect access patterns of N number of steps and stop the prefetcher from grabbing additional lines but at 16 cache lines the AActor object is way too big for the stride detector to keep up with. So you stride in gaps of 16 cache lines at a time through memory and you still get 2-3 extra cache lines after the initial access.

Secondly, a 1016 byte object just doesn't fit. It's word aligned but it's not cache line aligned and it's sure as hell not page aligned.

Best case scenario if you're updating two variables next to each other in memory the prefetcher gets both on the same cache line. Medium case scenario, the prefetcher has to grab the next line every so often. You'll get best most often and medium rarely.

Bad case scenario, the prefetcher has to grab the next cache line on the NEXT PAGE. Which only just became a thing on recent CPUs but also involves translating the virtual address of the next page to its physical page address which takes forever in data access terms. Bunch of pointer chasing, basically a few thousand clock of waiting.

The absolute worst case scenario is that the prefetcher thinks you need the next cache line, it's on the next page, it does the rigamarole of translating the next page's virtual address and you don't actually need it. You've done two orders of magnitude more work than reading a single variable for literally nothing.

So yeah. The prefetcher can do some weird ass shit when you throw weird and massive data structs at it. Slashing and burning the size down helps because the stride detector can start functioning again when the object is small enough. If it can be kept to a multiple of 64 bytes you even get page aligned again.

replies(1): >>43627531 #

46. cma ◴[08 Apr 25 20:07 UTC] No.43625933{4}[source]▶

>>43619650 #

Standard practice, but it bit Epic by surprise. You wouldn't think it would be needed at such small numbers. You wouldn't automatically think it would be needed on 3+ghz machines.

47. speed_spread ◴[08 Apr 25 21:29 UTC] No.43626653{5}[source]▶

>>43622238 #

The keyword here is "can". I'm just saying it's definitely possible that a 10MB memory reduction in a critical spot results in significant performance gains. I agree 100% that any optimization should be backed up by solid benchmarks.

replies(1): >>43627112 #

48. jorvi ◴[08 Apr 25 21:39 UTC] No.43626724{8}[source]▶

>>43623721 #

We pronounce it t-uh-f, as in "tough" without the o.

replies(1): >>43627306 #

49. maccard ◴[08 Apr 25 22:30 UTC] No.43627112{6}[source]▶

>>43626653 #

It also “can” also do absolutely nothing, or “can” introduce false sharing in multithreaded code.

Lots of things are possible - but speculating on every possibility as though they’re equally probable doesn’t provide any value. Actors in unreal are a fairly low level item, but most games aren’t going to have 25k actors in a world, and if they do, 10MB of memory usage fragmented across actors is likely the least of their worries.

50. thaumasiotes ◴[08 Apr 25 22:56 UTC] No.43627279{8}[source]▶

>>43623721 #

Mandarin is 吐 /tʰu/.

51. thaumasiotes ◴[08 Apr 25 23:00 UTC] No.43627306{9}[source]▶

>>43626724 #

Note that ν is an N, not a V. I think the standard transliteration of φτύνω from modern Greek would be "ftino".

(Note also that "tough" is pronounced t-uh-f /tʌf/, with nothing O-like anywhere in it.)

replies(1): >>43646450 #

52. Veliladon ◴[08 Apr 25 23:41 UTC] No.43627531{4}[source]▶

>>43625739 #

*an exponent of 64 bytes

If it can be kept to a exponent of 64 bytes you even get page aligned again.

53. reitzensteinm ◴[09 Apr 25 01:01 UTC] No.43627935{4}[source]▶

>>43622853 #

I wouldn't expect much either. The potential for speedups would be if there's locality for data on either side of the multiplayer padding, or if the actors have contiguous layout and deleting the data plays better with the CPU's stride prefetching.

Significant performance degredation is also possible if at some point a smart (but not wise) developer positioned the data to eliminate false sharing on either side.

Agreed that you shouldn't be using this heavy weight paradigm with large amounts of entities. My intention was just to add a bit of color to the idea that saving memory allocations can have implications beyond just the number of bytes you ultimately malloc.

54. jorvi ◴[10 Apr 25 17:55 UTC] No.43646450{10}[source]▶

>>43627306 #

I wonder if that's because of the two different ways you can spit: without windup ("tuf" sound) and with windup / breathing in ("hff-tuf" sound).