How a 20 year old bug in GTA San Andreas surfaced in Windows 11 24H2

1. amenghra ◴[23 Apr 25 18:08 UTC] No.43774928[source]▶

IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.

replies(11): >>43774993 #>>43775199 #>>43775210 #>>43775344 #>>43775361 #>>43775510 #>>43775759 #>>43776084 #>>43776311 #>>43776598 #>>43778608 #

2. roseway4 ◴[23 Apr 25 18:13 UTC] No.43774993[source]▶

>>43774928 (TP) #

iirc, Go intentionally randomizes map ordering for just this reason.

replies(1): >>43775587 #

3. bri3d ◴[23 Apr 25 18:29 UTC] No.43775199[source]▶

>>43774928 (TP) #

There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.

replies(3): >>43778515 #>>43779449 #>>43781342 #

4. mras0 ◴[23 Apr 25 18:30 UTC] No.43775210[source]▶

>>43774928 (TP) #

Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)

replies(1): >>43777280 #

5. gzalo ◴[23 Apr 25 18:43 UTC] No.43775344[source]▶

>>43774928 (TP) #

I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise

6. abnercoimbre ◴[23 Apr 25 18:45 UTC] No.43775361[source]▶

>>43774928 (TP) #

Regarding contracts, there's an additional lesson here, quoting from the source:

> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.

I suppose this is why Linux kernel maintainers insist on never breaking user space.

replies(1): >>43780673 #

7. willcipriano ◴[23 Apr 25 18:59 UTC] No.43775510[source]▶

>>43774928 (TP) #

Then you are wasting runtime clock cycles randomizing lists.

replies(3): >>43775920 #>>43776036 #>>43779041 #

8. withinboredom ◴[23 Apr 25 19:06 UTC] No.43775587[source]▶

>>43774993 #

Yep, and then you get crash reports you can’t reproduce.

replies(1): >>43780029 #

9. ormax3 ◴[23 Apr 25 19:24 UTC] No.43775759[source]▶

>>43774928 (TP) #

one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables

replies(2): >>43775906 #>>43780948 #

10. nayuki ◴[23 Apr 25 19:40 UTC] No.43775906[source]▶

>>43775759 #

You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.

replies(1): >>43776205 #

11. nayuki ◴[23 Apr 25 19:42 UTC] No.43775920[source]▶

>>43775510 #

Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

replies(2): >>43777295 #>>43780231 #

12. Cthulhu_ ◴[23 Apr 25 19:56 UTC] No.43776036[source]▶

>>43775510 #

Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.

13. plutaniano ◴[23 Apr 25 20:01 UTC] No.43776084[source]▶

>>43774928 (TP) #

Aren't you just creating another contract? Users might write code that depends on it being random.

replies(2): >>43776603 #>>43776715 #

14. irundebian ◴[23 Apr 25 20:16 UTC] No.43776205{3}[source]▶

>>43775906 #

In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.

replies(1): >>43776801 #

15. frollogaston ◴[23 Apr 25 20:27 UTC] No.43776311[source]▶

>>43774928 (TP) #

Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.

replies(2): >>43776596 #>>43778874 #

16. foxhill ◴[23 Apr 25 21:02 UTC] No.43776596[source]▶

>>43776311 #

it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.

replies(1): >>43778199 #

17. tantalor ◴[23 Apr 25 21:03 UTC] No.43776598[source]▶

>>43774928 (TP) #

Nope. You have to remember https://www.hyrumslaw.com/

  With a sufficient number of users of an API,
  it does not matter what you promise in the contract:
  all observable behaviors of your system
  will be depended on by somebody.

If you promise randomization, then somebody will depend on that :)

And then you can never remove it!

replies(2): >>43778253 #>>43780617 #

18. tantalor ◴[23 Apr 25 21:04 UTC] No.43776603[source]▶

>>43776084 #

For those users, do this instead: https://xkcd.com/221/

19. Artoooooor ◴[23 Apr 25 21:18 UTC] No.43776715[source]▶

>>43776084 #

Maybe it would be good to change all non promised things between releases. So that such unwritten rules never become something users rely upon.

20. johnisgood ◴[23 Apr 25 21:29 UTC] No.43776801{4}[source]▶

>>43776205 #

And you can disable these checks with a flag when it comes to Ada, and yeah, with SPARK, none of it happens at runtime.

Check the table at https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce..., look for "SPARK builds on the strengths of Ada to provide even more guarantees statically rather than dynamically.".

More reading:

https://docs.adacore.com/spark2014-docs/html/ug/en/tutorial....

https://learn.adacore.com (many books for learning Ada and SPARK) available in PDF, EPUB, and HTML format.

21. mabster ◴[23 Apr 25 22:19 UTC] No.43777280[source]▶

>>43775210 #

When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.

22. mabster ◴[23 Apr 25 22:21 UTC] No.43777295{3}[source]▶

>>43775920 #

Best change ever, that. Now it would also be nice if sets were ordered too.

replies(1): >>43778982 #

23. Arainach ◴[24 Apr 25 00:36 UTC] No.43778199{3}[source]▶

>>43776596 #

Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.

replies(2): >>43778618 #>>43784699 #

24. timewizard ◴[24 Apr 25 00:43 UTC] No.43778253[source]▶

>>43776598 #

> If you promise randomization

You don't. You say the order is undefined.

replies(2): >>43778894 #>>43778928 #

25. neuroelectron ◴[24 Apr 25 01:39 UTC] No.43778515[source]▶

>>43775199 #

There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.

replies(3): >>43778720 #>>43779274 #>>43779896 #

26. codebje ◴[24 Apr 25 02:01 UTC] No.43778608[source]▶

>>43774928 (TP) #

I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.

It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.

(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )

replies(1): >>43781009 #

27. dontlaugh ◴[24 Apr 25 02:03 UTC] No.43778618{4}[source]▶

>>43778199 #

Especially in games, it’s common for only the highly optimised release builds to have playable performance.

28. mjevans ◴[24 Apr 25 02:22 UTC] No.43778720{3}[source]▶

>>43778515 #

You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.

It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.

replies(1): >>43779908 #

29. throwaway2037 ◴[24 Apr 25 02:55 UTC] No.43778874[source]▶

>>43776311 #

I had to Google to find the tid bit that I read about Perl years ago. I think this will affect iteration order of dicts.

    > Nov 22, 2012 — Perl 5.18 will introduce per process hash randomization and almost certainly will feature a new hash function.

30. dwattttt ◴[24 Apr 25 03:00 UTC] No.43778894{3}[source]▶

>>43778253 #

You can randomly not randomise it :)

31. __float ◴[24 Apr 25 03:07 UTC] No.43778928{3}[source]▶

>>43778253 #

That isn't the point. In practice, if you provide randomness, it will be depended upon.

replies(1): >>43779002 #

32. throwaway2037 ◴[24 Apr 25 03:22 UTC] No.43778982{4}[source]▶

>>43777295 #

I am pretty sure there are trivial impls of sets with guaranteed iteration order in Python that use an underlying ordered map and a dummy value in each entry.

33. psnehanshu ◴[24 Apr 25 03:27 UTC] No.43779002{4}[source]▶

>>43778928 #

Why is that? Is that just bad coding habits?

replies(1): >>43779871 #

34. wat10000 ◴[24 Apr 25 03:34 UTC] No.43779041[source]▶

>>43775510 #

You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.

35. db48x ◴[24 Apr 25 04:33 UTC] No.43779274{3}[source]▶

>>43778515 #

There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).

As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.

https://millcomputing.com/wiki/Protection#Protecting_Stacks

replies(1): >>43783461 #

36. smarks ◴[24 Apr 25 05:16 UTC] No.43779449[source]▶

>>43775199 #

I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].

For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.

Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.

[1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...

[2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...

[3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...

37. simonask ◴[24 Apr 25 06:37 UTC] No.43779871{5}[source]▶

>>43779002 #

All of this is bad coding habits. That's why we're here.

38. dwattttt ◴[24 Apr 25 06:44 UTC] No.43779896{3}[source]▶

>>43778515 #

CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)

39. neuroelectron ◴[24 Apr 25 06:47 UTC] No.43779908{4}[source]▶

>>43778720 #

It should be efficient though, that's the point. Specialized hardware or instructions should be able to zero the stack in a single cycle, instead it's much more expensive. Of course the problem with this is it could be used to hide things just as easily, making it impossible to reverse engineer an unknown exploit.

replies(1): >>43783240 #

40. cristaloleg ◴[24 Apr 25 07:13 UTC] No.43780029{3}[source]▶

>>43775587 #

Same can be said about pointer addresses (random for each run). But ASLR exists for a specific reason.

41. masklinn ◴[24 Apr 25 07:55 UTC] No.43780231{3}[source]▶

>>43775920 #

> Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).

Notably, Go will randomise the starting point of hashmap iteration on each iteration.

42. scott_w ◴[24 Apr 25 09:03 UTC] No.43780617[source]▶

>>43776598 #

Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.

I know it's easier said than done everywhere, just found it to be an interesting parallel.

43. cylemons ◴[24 Apr 25 09:15 UTC] No.43780673[source]▶

>>43775361 #

But the linux equivalent here would be glibc, not the kernel

44. pjc50 ◴[24 Apr 25 10:09 UTC] No.43780948[source]▶

>>43775759 #

However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.

Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.

atoi() or atof() etc. work OK, but really you need a parser.

45. zerr ◴[24 Apr 25 10:21 UTC] No.43781009[source]▶

>>43778608 #

At booking.com? :)

replies(1): >>43799122 #

46. canucker2016 ◴[24 Apr 25 11:17 UTC] No.43781342[source]▶

>>43775199 #

Microsoft's Visual C++ compiler has the /Ge compiler option ( see https://learn.microsoft.com/en-us/cpp/build/reference/ge-ena... ) Deprecated since VC2005.

This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.

Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.

This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init

Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.

    The following types are automatically initialized:
    
        Scalars (arrays, pointers, floats)
        Arrays of pointers
        Structures (plain-old-data structures)
    
    The following are not automatically initialized:
    
        Volatile variables
        Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
        Classes that are not plain-old-data


    During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.

    With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.

    We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.

see https://web.archive.org/web/20200518153645/https://msrc-blog...

47. mjevans ◴[24 Apr 25 14:28 UTC] No.43783240{5}[source]▶

>>43779908 #

Why would a specialized instruction be necessary? 'the stack' is stored in memory just like everything else.

Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.

Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.

48. smj-edison ◴[24 Apr 25 14:45 UTC] No.43783461{4}[source]▶

>>43779274 #

This is really interesting—how do stack references work in this design?

replies(1): >>43789749 #

49. frollogaston ◴[24 Apr 25 16:37 UTC] No.43784699{4}[source]▶

>>43778199 #

Yeah, even their integration tests will probably run in opt mode.

50. db48x ◴[25 Apr 25 03:05 UTC] No.43789749{5}[source]▶

>>43783461 #

Technically I think you can read the “whole” stack; it’s only reads off of the ends of the stack as a whole that are prevented. However, note that the start of your current stack may not really be the start of the real stack.

Consider the case of a system call, such as `read`. You’re in user space and you have some stack frames on the stack as usual. You allocate a buffer on the stack (there’s a cpu instruction for that; it basically just extends your “turf¹” to include more of the stack page, and zeros it as mentioned) to hold the data you want to read. You then call `read` with the `call` instruction, including the address of the buffer and the buffer size as arguments. So far everything is very straight–forward.

But `read` is actually in a different protection domain; it’s part of the kernel. The CPU uses metadata previously set up by the kernel to turn this into a “portal call”. After the portal call your thread will be given a different protection domain. In principle this is the kernel’s protection domain, but in reality the kernel might split that up in many complicated ways. What is relevant here is that the turf of this protection domain has been modified to include this new stack frame. From the perspective of `read`, the stack has just started; there are no prior frames. The reality is that this stack frame is still part of the stack of the caller, it’s only the turf that has changed. Those prior stack frames still exist, but they are unreadable. Worse, the buffer is also unreadable; it’s located at an address that is not part of the kernel’s turf.

So obviously there needs to be another set of instructions for modifying turfs. The full set of obvious modifications are available, but the relevant one here is a temporary grant of read and/or write permissions to a function you are about to call. You would insert a `pass` instruction to pass along access to the buffer for the duration of the call. This access is automatically revoked after the call returns. (Ideally you wouldn’t actually have to do this manually for every portal call; instead you would call a non–portal `read` function in libc. This function’s job is to make the portal call, and whoever wrote it makes sure to include the `pass` instruction.)

¹ A turf is the set of addresses that a given thread running in a given protection domain can read and/or write.

51. codebje ◴[25 Apr 25 22:35 UTC] No.43799122{3}[source]▶

>>43781009 #

No, but they do (did?) have a vast ocean of Perl, and I did know a hacker or two who got hired to work there on it.