How I cut GTA Online loading times by 70%

(nee.lv)

3883 points kuroguro | 1 comments | 28 Feb 21 19:38 UTC | HN request time: 0.219s | source

Show context

breakingcups ◴[28 Feb 21 20:25 UTC] No.26296724[source]▶

It is absolutely unbelievable (and unforgivable) that a cash cow such as GTA V has a problem like this present for over 6 years and it turns out to be something so absolutely simple.

I do not agree with the sibling comment saying that this problem only looks simple and that we are missing context.

This online gamemode alone made $1 billion in 2017 alone.

Tweaking two functions to go from a load time of 6 minutes to less than two minutes is something any developer worth their salt should be able to do in a codebase like this equipped with a good profiler.

Instead, someone with no source code managed to do this to an obfuscated executable loaded with anti-cheat measures.

The fact that this problem is caused by Rockstar's excessive microtransaction policy (the 10MB of JSON causing this bottleneck are all available microtransaction items) is the cherry on top.

(And yes, I might also still be salty because their parent company unjustly DMCA'd re3 (https://github.com/GTAmodding/re3), the reverse engineered version of GTA III and Vice City. A twenty-year-old game. Which wasn't even playable without purchasing the original game.)

replies(40): >>26296812 #>>26296886 #>>26296970 #>>26297010 #>>26297087 #>>26297123 #>>26297141 #>>26297144 #>>26297184 #>>26297206 #>>26297323 #>>26297332 #>>26297379 #>>26297401 #>>26297448 #>>26297480 #>>26297806 #>>26297961 #>>26298056 #>>26298135 #>>26298179 #>>26298213 #>>26298234 #>>26298624 #>>26298682 #>>26298777 #>>26298860 #>>26298970 #>>26299369 #>>26299512 #>>26299520 #>>26300002 #>>26300046 #>>26301169 #>>26301475 #>>26301649 #>>26301961 #>>26304727 #>>26305016 #>>26311396 #

masklinn ◴[28 Feb 21 20:43 UTC] No.26296886[source]▶

>>26296724 #

> The fact that this problem is caused by Rockstar's excessive microtransaction policy (the 10MB of JSON causing this bottleneck are all available microtransaction items) is the cherry on top.

For what it's worth, 10MB of JSON is not much. Duplicating the example entry from the article 63000 times (replacing `key` by a uuid4 for unicity) yields 11.5MB JSON.

Deserialising that JSON then inserting each entry in a dict (indexed by key) takes 450ms in Python.

But as Bruce Dawson oft notes, quadratic behaviour is the sweet spot because it's "fast enough to go into production, and slow enough to fall over once it gets there". Here odds are there were only dozens or hundreds of items during dev so nobody noticed it would become slow as balls beyond a few thousand items.

Plus load times are usually the one thing you start ignoring early on, just start the session, go take a coffee or a piss, and by the time you're back it's loaded. Especially after QA has notified of slow load times half a dozen times, the devs (with fast machines and possibly smaller development dataset) go "works fine", and QA just gives up.

replies(11): >>26297203 #>>26297314 #>>26298126 #>>26298269 #>>26298511 #>>26298524 #>>26300274 #>>26301081 #>>26302098 #>>26305727 #>>26306126 #

ldng ◴[28 Feb 21 21:37 UTC] No.26297314[source]▶

>>26296886 #

But is quadratic the real issue ? Isn't that a developer answer ?

The best algorithm for small, medium or a large size are not the same and generally behave poorly in the other cases. And what is small? Medium? Large?

The truth is that there is no one size fits all and assumptions need to be reviewed periodically and adapted accordingly. And they never are... Ask a DBA.

replies(4): >>26297536 #>>26299324 #>>26300073 #>>26300359 #

gridspy ◴[28 Feb 21 22:04 UTC] No.26297536[source]▶

>>26297314 #

quadratic is a fancy way of saying "this code is super fast with no data, super slow once you have a decent amount"

The problem is that when you double the amount of stuff in the JSON document, you quadruple (or more) the scanning penalty in both the string and the list.

Why quadruple? Because you end up scanning a list which is twice as long. You have to scan that list twice as many times. 2x2 = 4. The larger list no longer fits in the fast (cache) memory, among other issues. The cache issue alone can add another 10x (or more!) penalty.

replies(2): >>26301042 #>>26301492 #

ldng ◴[01 Mar 21 10:50 UTC] No.26301492[source]▶

>>26297536 #

> quadratic is a fancy way of saying "this code is super fast with no data, super slow once you have a decent amount"

Well, that is an abuse of the term, by people that sometimes don't actually know what that really means. Up to a point, quadratic IS faster than linear after all for example. Too many developer love too abuse the word blindly.

If it is badly tested with no data, it is badly tested with no data. Period. Not "quadratic".

> The problem is that when you double the amount of stuff in the JSON document, you quadruple (or more) the scanning penalty in both the string and the list.

My point was precisely it depends on the data and initial assumption are to be routinely revised. I was making a general point.

Maybe the guy was pinky-sworn that the JSON would hardly change and that the items were supposed to be ordered, sequential and no more than 101. For all you know it is even documented and nobody cared/remembered/checked when updating the JSON. But we don't know, obfuscated code don't comes with comments and context ...

Or, it is actually a real rookie mistake. It probably was, but we don't have all the facts.

replies(2): >>26301691 #>>26313492 #

1. imtringued ◴[02 Mar 21 09:34 UTC] No.26313492[source]▶

>>26301492 #

>Well, that is an abuse of the term, by people that sometimes don't actually know what that really means. Up to a point, quadratic IS faster than linear after all for example. Too many developer love too abuse the word blindly.

The problem with this argument is that if the data size and constants are sufficiently small enough people don't care about whether the linear algorithm is slow. In the case of JSON parsing the constants are exactly the same no matter what string length algorithm you use. Thus when n is small you don't care since the overall loading time is short anyway. When n is big you benefit from faster loading times.

I honestly don't understand what goal you are trying to accomplish. By your logic it is more important to keep short loading times short and long loading times long rather than do the conventional engineering wisdom of lowering the average or median loading time which will sometimes decrease the duration of long loading screens at the expensive of increasing the duration of short loading screens.

↑