Using gRPC for (local) inter-process communication (2021)

(www.mpi-hd.mpg.de)

121 points zardinality | 3 comments | 20 Nov 24 13:42 UTC | HN request time: 0.836s | source

Show context

pengaru ◴[20 Nov 24 16:34 UTC] No.42195597[source]▶

>>42193859 (OP) #

There's a mountain of grpc-centric python code at $dayjob and it's been miserable to live with. Maybe it's less awful in c/c++, or at least confers some decent performance there. In python it's hot garbage.

replies(5): >>42195747 #>>42196231 #>>42196568 #>>42196845 #>>42197041 #

andy_ppp ◴[20 Nov 24 16:47 UTC] No.42195747[source]▶

>>42195597 #

Strongly agree, it’s has loads of problems, my least favourite being the schema is not checked in the way you might think, there’s not even a checksum to say this message and this version of the schema match. So when there’s old services/clients around and people haven’t versioned their schema’s safely (there was no mechanism for this apart from manually checking in PRs) you can get gibberish back for fields that should contain data. It’s basically just a binary blob with whatever schema the client has overlaid so debugging is an absolute pain. Unless you are Google scale use a text based format like JSON and save yourself a lot of hassle.

replies(3): >>42195995 #>>42196041 #>>42196375 #

jayd16 ◴[20 Nov 24 17:11 UTC] No.42196041[source]▶

>>42195747 #

You can trivially make breaking changes in a JSON blob too. GRPC has well documented ways to make non-breaking changes. If you're working somewhere where breaking schema changes go in with little fanfare and much debugging then I'm not sure JSON will save you.

The only way to know is to dig through CLs? Write a test.

There's also automated tooling to compare protobuff schemas for breaking changes.

replies(1): >>42196439 #

andy_ppp ◴[20 Nov 24 17:49 UTC] No.42196439[source]▶

>>42196041 #

JSON contains a description of the structure of the data that is readable by both machines and humans. JSON can certainly go wrong but it’s much simpler to see when it has because of this. GRPC is usually a binary black box that adds loads of developer time to upskill, debug, figure out error cases and introduces whole new classes of potential bugs.

If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.

replies(1): >>42197049 #

aseipp ◴[20 Nov 24 18:59 UTC] No.42197049[source]▶

>>42196439 #

> JSON contains a description of the structure of the data that is readable by both machines and humans.

No, it by definition does not, because JSON has no schema. Only your application contains and knows the (expected) structure of the data, but you literally cannot know what structure any random blob of JSON objects will have without a separate schema. When you read a random /docs page telling you "the structure of the resulting JSON object from this request is ...", that's just a schema but written in English instead of code. This has big downstream ramifications.

For example, many APIs make the mistake of parsing JSON and only returning some opaque "Object" type, which you then have to map onto your own domain objects, meaning you actually parse every JSON object twice: once into the opaque structure, and once into your actual application type. This has major efficiency ramifications when you are actually dealing with a lot of JSON. The only way to do better than this is to have a schema in some form -- any form at all, even English prose -- so you can go from the JSON text representation directly into your domain type at parse-time. This is part of the reason why so many JSON libraries in every language tend to have some high level way of declaring a JSON object in the host language, typically as some kind of 'struct' or enum, so that they can automatically derive an actually efficient parsing step and skip intermediate objects. There's just no way around it. JSON doesn't have any schema, and that's part of its appeal, but in practice one always exists somewhere.

You can use protobuf in text-based form too, but from what you said, you're probably screwed anyway if your coworkers are just churning stuff and changing the values of fields and stuff randomly. They're going to change the meaning of JSON fields willy nilly too and there will be nothing to stop you from landing back in step 1.

I will say that the quality of gRPC integrations tends to vary wildly based on language though, which adds debt, you're definitely right about that.

replies(2): >>42197677 #>>42202070 #

1. imtringued ◴[21 Nov 24 07:52 UTC] No.42202070[source]▶

>>42197049 #

Here are some sad news for you: The flexibility of JSON and CBOR cannot be matched by any schema based system, because it is equivalent to giving up that advantage.

Sure, the removal of a field can cause an application level error, but that is probably the most benign form of failure there is. What's worse is when no error occurs and the data is simply reinterpreted to fit the schema. Then your database will slowly fill up with corrupted garbage data and you'll have to restore from a backup.

What you have essentially accomplished in your response is to miss the entire point.

There are also other problems with protobuf in the sense that the savings aren't actually as big as you'd expect. E.g. there is still costly parsing, the data transmitted over the wire isn't significantly smaller unless you have data that is a poor fit for JSON.

replies(2): >>42202160 #>>42204164 #

2. ◴[21 Nov 24 08:11 UTC] No.42202160[source]▶

>>42202070 (TP) #

3. jonathanberi ◴[21 Nov 24 13:34 UTC] No.42204164[source]▶

>>42202070 (TP) #

It's also worth noting CDDL [1], which adds schema-like utility to CBOR (and technically JSON.) We've started to use it in more places where we use CBOR.

[1] https://datatracker.ietf.org/doc/rfc8610/

↑