How I cut GTA Online loading times by 70%

1. tyingq ◴[28 Feb 21 20:23 UTC] No.26296701[source]▶

>>26296339 (OP) #

"They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries."

Ahh. Modern software rocks.

replies(3): >>26296764 #>>26297102 #>>26297434 #

2. bombcar ◴[28 Feb 21 20:29 UTC] No.26296764[source]▶

>>26296701 (TP) #

At least parse it into SQLite. Once.

replies(2): >>26297066 #>>26297149 #

3. brianberns ◴[28 Feb 21 21:07 UTC] No.26297066[source]▶

>>26296764 #

They probably add more entries over time (and maybe update/delete old ones), so you’d have to be careful about keeping the local DB in sync.

replies(1): >>26297604 #

4. LukvonStrom ◴[28 Feb 21 21:11 UTC] No.26297102[source]▶

>>26296701 (TP) #

why not embed node.js to do this efficiently :D

5. tyingq ◴[28 Feb 21 21:17 UTC] No.26297149[source]▶

>>26296764 #

I think just using a length encoded serialization format would have made this work reasonably fast.

replies(1): >>26297224 #

6. hobofan ◴[28 Feb 21 21:27 UTC] No.26297224{3}[source]▶

>>26297149 #

Or just any properly implemented JSON parser. That's a laughable small amount of JSON, which should easily be parsed in milliseconds.

7. ed25519FUUU ◴[28 Feb 21 21:52 UTC] No.26297434[source]▶

>>26296701 (TP) #

Parsing 63k items in a 10 MB json string is pretty much a breeze on any modern system, including raspberry pi. I wouldn't even consider json as an anti-pattern with storing that much data if it's going over the wire (compressed with gzip).

Down a little in the article and you'll see one of the real issues:

> But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless.

replies(2): >>26297496 #>>26298129 #

8. tyingq ◴[28 Feb 21 21:59 UTC] No.26297496[source]▶

>>26297434 #

The JSON patch took out more of the elapsed time. Granted, it was a terrible parser. But I still think JSON is a poor choice here. 63k x X checks for colons, balanced quotes/braces and so on just isn't needed.

  Time with only duplication check patch: 4m 30s
  Time with only JSON parser patch:       2m 50s

replies(1): >>26300402 #

9. bombcar ◴[28 Feb 21 22:14 UTC] No.26297604{3}[source]▶

>>26297066 #

So just have the client download the entire DB each time. Can’t be that many megabytes.

replies(1): >>26299123 #

10. Slikey ◴[28 Feb 21 23:29 UTC] No.26298129[source]▶

>>26297434 #

Check out https://github.com/simdjson/simdjson

More than 3 GB/s are possible. Like you said 10 MB of JSON is a breeze.

11. Twirrim ◴[01 Mar 21 02:34 UTC] No.26299123{4}[source]▶

>>26297604 #

I did a very very ugly quick hack in python. Took the example JSON, made the one list entry a string (lazy hack), repeated it 56,000 times. That resulted in a JSON doc that weighed in at 10M. My initial guess at 60,000 times was a pure fluke!

Dumped it in to a very simple sqlite db:

    $ du -hs gta.db
    5.2M    gta.db

Even 10MB is peanuts for most of their target audience. Stick it in an sqlite db punted across and they'd cut out all of the parsing time too.

12. masklinn ◴[01 Mar 21 07:16 UTC] No.26300402{3}[source]▶

>>26297496 #

> But I still think JSON is a poor choice here.

It’s an irrelevant one. The json parser from the python stdlib parses a 10Mb json patterned after the sample in a few dozen ms. And it’s hardly a fast parser.