Using gRPC for (local) inter-process communication (2021)

(www.mpi-hd.mpg.de)

Show context

pengaru ◴[20 Nov 24 16:34 UTC] No.42195597[source]▶

>>42193859 (OP) #

There's a mountain of grpc-centric python code at $dayjob and it's been miserable to live with. Maybe it's less awful in c/c++, or at least confers some decent performance there. In python it's hot garbage.

replies(5): >>42195747 #>>42196231 #>>42196568 #>>42196845 #>>42197041 #

andy_ppp ◴[20 Nov 24 16:47 UTC] No.42195747[source]▶

>>42195597 #

Strongly agree, it’s has loads of problems, my least favourite being the schema is not checked in the way you might think, there’s not even a checksum to say this message and this version of the schema match. So when there’s old services/clients around and people haven’t versioned their schema’s safely (there was no mechanism for this apart from manually checking in PRs) you can get gibberish back for fields that should contain data. It’s basically just a binary blob with whatever schema the client has overlaid so debugging is an absolute pain. Unless you are Google scale use a text based format like JSON and save yourself a lot of hassle.

replies(3): >>42195995 #>>42196041 #>>42196375 #

jayd16 ◴[20 Nov 24 17:11 UTC] No.42196041[source]▶

>>42195747 #

You can trivially make breaking changes in a JSON blob too. GRPC has well documented ways to make non-breaking changes. If you're working somewhere where breaking schema changes go in with little fanfare and much debugging then I'm not sure JSON will save you.

The only way to know is to dig through CLs? Write a test.

There's also automated tooling to compare protobuff schemas for breaking changes.

replies(1): >>42196439 #

andy_ppp ◴[20 Nov 24 17:49 UTC] No.42196439[source]▶

>>42196041 #

JSON contains a description of the structure of the data that is readable by both machines and humans. JSON can certainly go wrong but it’s much simpler to see when it has because of this. GRPC is usually a binary black box that adds loads of developer time to upskill, debug, figure out error cases and introduces whole new classes of potential bugs.

If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.

replies(1): >>42197049 #

1. aseipp ◴[20 Nov 24 18:59 UTC] No.42197049[source]▶

>>42196439 #

> JSON contains a description of the structure of the data that is readable by both machines and humans.

No, it by definition does not, because JSON has no schema. Only your application contains and knows the (expected) structure of the data, but you literally cannot know what structure any random blob of JSON objects will have without a separate schema. When you read a random /docs page telling you "the structure of the resulting JSON object from this request is ...", that's just a schema but written in English instead of code. This has big downstream ramifications.

For example, many APIs make the mistake of parsing JSON and only returning some opaque "Object" type, which you then have to map onto your own domain objects, meaning you actually parse every JSON object twice: once into the opaque structure, and once into your actual application type. This has major efficiency ramifications when you are actually dealing with a lot of JSON. The only way to do better than this is to have a schema in some form -- any form at all, even English prose -- so you can go from the JSON text representation directly into your domain type at parse-time. This is part of the reason why so many JSON libraries in every language tend to have some high level way of declaring a JSON object in the host language, typically as some kind of 'struct' or enum, so that they can automatically derive an actually efficient parsing step and skip intermediate objects. There's just no way around it. JSON doesn't have any schema, and that's part of its appeal, but in practice one always exists somewhere.

You can use protobuf in text-based form too, but from what you said, you're probably screwed anyway if your coworkers are just churning stuff and changing the values of fields and stuff randomly. They're going to change the meaning of JSON fields willy nilly too and there will be nothing to stop you from landing back in step 1.

I will say that the quality of gRPC integrations tends to vary wildly based on language though, which adds debt, you're definitely right about that.

replies(2): >>42197677 #>>42202070 #

2. andy_ppp ◴[20 Nov 24 20:15 UTC] No.42197677[source]▶

>>42197049 (TP) #

If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure? If I give you a GRPC binary you need the separate schema and tools to be able to comprehend it. That’s all I’m saying is the separation of the schema from some minimal structure makes the debugging of services more difficult. I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Maybe the tools are fantastic not but I still think being able to debug messages without them is an advantage in almost all systems, you probably don’t need the level of performance GRPC provides.

If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

replies(1): >>42198535 #

3. aseipp ◴[20 Nov 24 21:56 UTC] No.42198535[source]▶

>>42197677 #

> If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure?

That's too easy. What if I give you a 200KiB JSON object with 40+ nested fields that's whitespace stripped and has base64 encoded values? Its "structure" is a red herring. It is not a matter of text or binary. The net result is I still have to use a tool to inspect it, even if that's only something like gron/jq in order to make it actually human readable. But at the end of the day the structure is a concern of the application, I have to evaluate its structure in the context of that application. I don't just look at JSON objects for fun. I do it mostly to debug stuff. I still need the schematic structure of the object to even know what I need to write.

FWIW, I normally use something like grpcurl in order to do curl-like requests/responses to a gRPC endpoint and you can even have it give you the schema for a given service. This has worked quite well IME for almost all my needs, but I accept with this stuff you often have lots of "one-off" cases that you have to cobble stuff together or just get dirty with printf'ing somewhere inside your middleware, etc.

> I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Yes, I totally am with you on this. Many of the implementations just totally suck and JSON is common enough nowadays that you kind of have to at least have something that doesn't completely fall over, if you want to be taken remotely seriously. It's hard to write a good JSON library, but it's definitely harder to write a good full gRPC stack. I 100% have your back on this. I would probably dislike gRPC even more but I'm lucky enough to use it with a "good" toolkit (Rust/Prost.)

> If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

I mean, if your entire complaint is about text vs binary, not efficiency or correctness, JSON Protobuf seems like it fits your needs. You still get the other benefits of gRPC you'd have anywhere (an honest-to-god schema, better transport efficiency over mandated HTTP/2, some amount of schema-generic middleware, first-class streaming, etc etc.)

FWIW, I don't particularly love gRPC. And while I admit I loathe JSON, I'm mainly pushing back on the notion that JSON has some "schema" or structure. No, it doesn't! Your application has and knows structure. A JSON object is just a big bag of stuff. For all its failings, gRPC having a schema is a matter of it actually putting the correct foot first and admitting that your schema is real, it exists, and most importantly can be written down precisely and checked by tools!

4. imtringued ◴[21 Nov 24 07:52 UTC] No.42202070[source]▶

>>42197049 (TP) #

Here are some sad news for you: The flexibility of JSON and CBOR cannot be matched by any schema based system, because it is equivalent to giving up that advantage.

Sure, the removal of a field can cause an application level error, but that is probably the most benign form of failure there is. What's worse is when no error occurs and the data is simply reinterpreted to fit the schema. Then your database will slowly fill up with corrupted garbage data and you'll have to restore from a backup.

What you have essentially accomplished in your response is to miss the entire point.

There are also other problems with protobuf in the sense that the savings aren't actually as big as you'd expect. E.g. there is still costly parsing, the data transmitted over the wire isn't significantly smaller unless you have data that is a poor fit for JSON.

replies(2): >>42202160 #>>42204164 #

5. ◴[21 Nov 24 08:11 UTC] No.42202160[source]▶

>>42202070 #

6. jonathanberi ◴[21 Nov 24 13:34 UTC] No.42204164[source]▶

>>42202070 #

It's also worth noting CDDL [1], which adds schema-like utility to CBOR (and technically JSON.) We've started to use it in more places where we use CBOR.

[1] https://datatracker.ietf.org/doc/rfc8610/

↑