Most active commenters
  • dan-robertson(3)
  • acdha(3)
  • thw0rted(3)

←back to thread

3883 points kuroguro | 33 comments | | HN request time: 1.76s | source | bottom
1. comboy ◴[] No.26296735[source]
Holy cow, I'm a very casual gamer, I was excited about the game but when it came out I decided I don't want to wait that long and I'll wait until they sort it out. 2 years later it still sucked. So I abandoned it. But.. this... ?! This is unbelievable. I'm certain that many people left this game because of the waiting time. Then there are man-years wasted (in a way different than desired).

Parsing JSON?! I thought it was some network game logic finding session magic. If this is true that's the biggest WTF I saw in the last few years and we've just finished 2020.

Stunning work just having binary at hand. But how could R* not do this? GTAV is so full of great engineering. But if it was a CPU bottleneck then who works there that wouldn't just be irked to try to nail it? I mean it seems like a natural thing to try to understand what's going on inside when time is much higher than expected even in the case where performance is not crucial. It was crucial here. Almost directly translates to profits. Unbelievable.

replies(5): >>26297228 #>>26297263 #>>26297997 #>>26298680 #>>26299917 #
2. dan-robertson ◴[] No.26297228[source]
I don’t think the lesson here is “be careful when parsing json” so much as it’s “stop writing quadratic code.” The json quadratic algorithm was subtle. I think most people’s mental model of sscanf is that it would be linear in the number of bytes it scans, not that it would be linear in the length of the input. With smaller test data this may have been harder to catch. The linear search was also an example of bad quadratic code that works fine for small inputs.

Some useful lessons might be:

- try to make test more like prod.

- actually measure performance and try to improve it

- it’s very easy to write accidentally quadratic code and the canonical example is this sort of triangular computation where you do some linear amount of work processing all the finished or remaining items on each item you process.

As I read the article, my guess was that it was some terrible synchronisation bug (eg download a bit of data -> hand off to two sub tasks in parallel -> each tries to take out the same lock on something (eg some shared data or worse, a hash bucket but your hash function is really bad so collisions are frequent) -> one process takes a while doing something, the other doesn’t take long but more data can’t be downloaded until it’s done -> the slow process consistently wins the race on some machines -> downloads get blocked and only 1 cpu is being used)

replies(6): >>26297354 #>>26297512 #>>26297996 #>>26298417 #>>26300929 #>>26301783 #
3. mdoms ◴[] No.26297263[source]
Me and my twice-a-week gaming group enjoyed GTA V but abandoned it years ago simply because of the load times. We have 2x short slots (90-120 minutes) each week to play and don't want to waste them in loading screens.

We all would have picked this game back up in a second if the load times were reduced. Although I must say even with the same results as this author, 2 minutes is still too long. But I'll bet that, given the source code, there are other opportunities to improve.

replies(1): >>26301902 #
4. petters ◴[] No.26297354[source]
Is there some reason sscanf can not be implemented without calling strlen?
replies(1): >>26297827 #
5. Nitramp ◴[] No.26297512[source]
- do not implement your own JSON parser (I mean, really?).

- if you do write a parser, do not use scanf (which is complex and subtle) for parsing, write a plain loop that dispatches on characters in a switch. But really, don't.

replies(2): >>26298311 #>>26301875 #
6. ddulaney ◴[] No.26297827{3}[source]
It could be, and the article acknowledges that possibility. For example, a cursory check of the musl sscanf [0] suggests that it does not (though I may have missed something). However, whichever implementation Rockstar used apparently does.

[0]: https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfscanf.c

replies(3): >>26298967 #>>26301731 #>>26331946 #
7. ant6n ◴[] No.26297996[source]
I thought the lesson is "listen to your customers and fix the issues they complain about".
8. XCSme ◴[] No.26297997[source]
> But how could R* not do this? GTAV is so full of great engineering

I assume there were different people working on the core game engine and mechanics VS loading. It could as well be some modular system, where someone just implemented the task "load items during online mode loading screen screen".

9. dan-robertson ◴[] No.26298311{3}[source]
I think sscanf is subtle because when you think about what it does (for a given format string), it’s reasonably straightforward. The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear. The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

I don’t use my own json parser but I nearly do. If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

replies(2): >>26300188 #>>26310769 #
10. acdha ◴[] No.26298417[source]
> actually measure performance and try to improve it

This really rings truest to me: I find it hard to believe nobody ever plays their own game but I’d easily believe that the internal culture doesn’t encourage anyone to do something about it. It’s pretty easy to imagine a hostile dev-QA relationship or management keeping everyone busy enough that it’s been in the backlog since it’s not causing crashes. After all, if you cut “overhead” enough you might turn a $1B game into a $1.5B one, right?

replies(1): >>26299894 #
11. jiggawatts ◴[] No.26298680[source]
> Parsing JSON?!

Many developers I have spoken to out there in the wild in my role as a consultant have wildly distorted mental models of performance, often many orders of magnitude incorrect.

They hear somewhere that "JSON is slow", which it is, but you and I know that it's not this slow. But "slow" can encompass something like 10 orders of magnitude, depending on context. Is it slow relative to a non-validating linear binary format? Yes. Is it minutes slow for a trivial amount of data? No. But in their mind... it is, and there's "nothing" that can be done about it.

Speaking of which: A HTTPS REST API call using JSON encoding between two PaaS web servers in Azure is about 3-10ms. A local function call is 3-10ns. In other words, a lightweight REST call is one million times slower than a local function call, yet many people assume that a distributed mesh microservices architecture has only "small overheads"! Nothing could be further from the truth.

Similarly, a disk read on a mechanical drive is hundreds of thousands of times slower than local memory, which is a thousand times slower than L1 cache.

With ratios like that being involved on a regular basis, it's no wonder that programmers make mistakes like this...

replies(1): >>26299393 #
12. JdeBP ◴[] No.26298967{4}[source]
Slightly less cursory: https://news.ycombinator.com/item?id=26298300
13. salawat ◴[] No.26299393[source]
Tbe funny thing is, as a long time SDET, I had to give up trying to get people to write or architect in a more "local first" manner.

Everyone thinks the network is free... Until it isn't. Every bit move in a computer has a time cost, and yes, it's small... But... When you have processors as fast as what exist today, it seems a sin that we delegate so much functionality out to some other machine across a network boundary when the same work could be done locally. The reason why though?

Monetizability and trust. All trivial computation must be done on my services so they can be metered and charged for.

We're hamstringing the programs we run for the sole reason that we don't want to make tools. We want to make invoices.

replies(1): >>26308164 #
14. Jach ◴[] No.26299894{3}[source]
Lots of possibilities. Another one I imagined is that "only the senior devs know how to use a profiler, and they're stuck in meetings all the time."
replies(1): >>26302629 #
15. lumberingjack ◴[] No.26299917[source]
It gets worse they're brand new game Red Dead online does the same thing as soon as it did it the first time I logged out and charged back
16. Nitramp ◴[] No.26300188{4}[source]
> The code in question did sscanf("%d", ...), which you read as “parse the digits at the start of the string into a number,” which is obviously linear.

I think part of the problem is that scanf has a very broad API and many features via its format string argument. I assume that's where the slowdown comes from here - scanf needs to implement a ton of features, some of which need the input length, and the implementor expected it to be run on short strings.

> The subtlety is that sscanf doesn’t do what you expect. I think that “don’t use library functions that don’t do what you expect” is impossible advice.

I don't know, at face value it seems reasonable to expect programmers to carefully check whether the library function they use does what they want it to do? How would you otherwise ever be sure what your program does?

There might be an issue that scanf doesn't document it's performance well. But using a more appropriate and tighter function (atoi?) would have avoided the issue as well.

Or, you know, don't implement your own parser. JSON is deceptively simple, but there's still enough subtlety to screw things up, qed.

replies(2): >>26300368 #>>26300865 #
17. thaumasiotes ◴[] No.26300368{5}[source]
> I assume that's where the slowdown comes from here - scanf needs to implement a ton of features, some of which need the input length, and the implementor expected it to be run on short strings.

I didn't get that impression. It sounded like the slowdown comes from the fact that someone expected sscanf to terminate when all directives were successfully matched, whereas it actually terminates when either (1) the input is exhausted; or (2) a directive fails. There is no expectation that you run sscanf on short strings; it works just as well on long ones. The expectation is that you're intentionally trying to read all of the input you have. (This expectation makes a little more sense for scanf than it does for sscanf.)

The scanf man page isn't very clear, but it looks to me like replacing `sscanf("%d", ...)` with `sscanf("%d\0", ...)` would solve the problem. "%d" will parse an integer and then dutifully read and discard the rest of the input. "%d\0" will parse an integer and immediately fail to match '\0', forcing a termination.

EDIT: on my xubuntu install, scanf("%d") does not clear STDIN when it's called, which conflicts with my interpretation here.

replies(1): >>26300451 #
18. JdeBP ◴[] No.26300451{6}[source]
No it would not. Think about what the function would see as its format string in both cases.

The root cause here isn't formatting or scanned items. It is C library implementations that implement the "s" versions of these functions by turning the input string into a nonce FILE object on every call, which requires an initial call to strlen() to set up the end of read buffer point. (C libraries do not have to work this way. Neither P.J. Plauger's Standard C library nor mine implement sscanf() this way. I haven't checked Borland's or Watcom's.)

See https://news.ycombinator.com/item?id=26298300 and indeed Roger Leigh six months ago at https://news.ycombinator.com/item?id=24460852 .

replies(1): >>26301721 #
19. dan-robertson ◴[] No.26300865{5}[source]
But sscanf does do what they want it to do by parsing numbers. The problem is that it also calls strlen. I’m still not convinced that it’s realistically possible to have people very carefully understand the performance characteristics of every function they use.

Every programmer I know thinks about performance of functions either by thinking about what the function is doing and guessing linear/constant, or by knowing what the data structure is and guessing (eg if you know you’re doing some insert operation on a binary tree, guess that it’s logarithmic), or by knowing that the performance is subtle (eg “you would guess that this is log but it needs to update some data on every node so it’s linear”). When you write your own library you can hopefully avoid having functions with subtle performance and make sure things are documented well (but then you also don’t think they should be writing their own library). When you use the C stdlib you’re a bit stuck. Maybe most of the functions there should just be banned from the codebase, but I would guess that would be hard.

20. woko ◴[] No.26300929[source]
> actually measure performance and try to improve it

This reminds me that I used to do that all the time when programming with Matlab. I have stopped investigating performance bottlenecks after switching to Python. It is as if I traded performance profiling with unit testing in my switch from Matlab to Python.

I wonder if there are performance profilers which I could easily plug into PyCharm to do what I used to do with Matlab's default IDE (with a built-in profiler) and catch up with good programming practices. Or maybe PyCharm does that already and I was not curious enough to investigate.

21. pja ◴[] No.26301721{7}[source]
Yes, it looks that way. On the unix/linux side of things, glibc also implements scanf() by converting to a FILE* object, as does the OpenBSD implementation.

It looks like this approach is taken by the majority of sscanf() implementations!

I honestly would not personally have expected sscanf() to implicitly call strlen() on every call.

22. pja ◴[] No.26301731{4}[source]
A lot of libc implementations seem to implement sscanf() this way: As well as the windows libc ones mentioned above, I checked the OpenBSD & glibc implemtations & they worked the same way.
23. simias ◴[] No.26301783[source]
The JSON parsing is forgivable (I actually didn't know that scanf computed the length of the string for every call) but the deduplication code is a lot less so, especially in C++ where maps are available in the STL.

It also comforts me into my decision of never using scanf, instead preferring manual parsing with strtok_r and strtol and friends. It's just not robust and flexible enough.

24. thw0rted ◴[] No.26301875{3}[source]
This is probably good advice but not even relevant. It's down one level from the real problem: when your game spends 6 minutes on a loading screen, *profile* the process first. You can't optimize what you haven't measured. Now, if you've identified that JSON parsing is slow, you can start worrying about how to fix that (which, obviously, should be "find and use a performant and well-tested library".)
25. thw0rted ◴[] No.26301902[source]
I wonder if a paid subscription would have fixed this? If you left a paid MMO, they'd probably ask you to fill out an exit survey, and you could say "I'm canceling because load times are terrible", which would (hopefully) raise the priority of reducing load times. But since GTA online is "free", there's not a single exit point where they can ask "why did you stop playing".
replies(1): >>26317613 #
26. acdha ◴[] No.26302629{4}[source]
I could easily imagine variations of that where this is in maintenance mode with a couple of junior programmers because the senior ones either burnt out or moved on to another project. I’ve definitely gotten the impression that most games studios have roughly the same attitude towards their employees as a strip-miner has towards an Appalachian hilltop.
replies(1): >>26303239 #
27. disgruntledphd2 ◴[] No.26303239{5}[source]
If this were anyone else but Rockstar, I'd agree with you.

But Rockstar essentially only have GTA and Red Dead to take care of, it's not like they're making an annual title or something :)

replies(1): >>26303402 #
28. acdha ◴[] No.26303402{6}[source]
True, but they could still be understaffing and have their senior people working on the next big version rather than maintenance. It’s definitely penny wise, pound foolish no matter the exact details.
29. gridspy ◴[] No.26308164{3}[source]
And like so many things, we're blind to how our economic systems are throwing sand in the gears of our technical ones.

I love your point that shipping a library (of code to locally execute) with a good API would outperform an online HTTPS API for almost all tasks.

30. azernik ◴[] No.26310769{4}[source]
> If this were some custom format rather than json and the parser still used sscanf, the bug would still happen. So I think json is somewhat orthogonal to the matter.

What's the point of using standard formats if you're not taking advantage of off-the-shelf software for handling it?

31. thatguy0900 ◴[] No.26317613{3}[source]
Gta has made billions off of its shark card micro transaction system. So the incentives are probably pretty similar for player retention. Granted, the players leaving over load times are probably not the players who are invested enough to spend thousands in micro transactions.
replies(1): >>26326402 #
32. thw0rted ◴[] No.26326402{4}[source]
That's my point though, you don't get to survey people not buying the microtransaction because they quit due to terrible load times, whereas you could survey people who cancel a subscription. I guess they could still gather data by reading reviews, looking through forums / Reddit / whatever, and tallying up complaints, though.
33. SolarNet ◴[] No.26331946{4}[source]
Part of this is that game companies are notorious for re-implementing standard libraries for "performance". I suspect both shitty implementations of sscanf and the not-a-hashmap stem from this.