Using gRPC for (local) inter-process communication (2021)

1. pengaru ◴[20 Nov 24 16:34 UTC] No.42195597[source]▶

>>42193859 (OP) #

There's a mountain of grpc-centric python code at $dayjob and it's been miserable to live with. Maybe it's less awful in c/c++, or at least confers some decent performance there. In python it's hot garbage.

replies(5): >>42195747 #>>42196231 #>>42196568 #>>42196845 #>>42197041 #

2. andy_ppp ◴[20 Nov 24 16:47 UTC] No.42195747[source]▶

>>42195597 (TP) #

Strongly agree, it’s has loads of problems, my least favourite being the schema is not checked in the way you might think, there’s not even a checksum to say this message and this version of the schema match. So when there’s old services/clients around and people haven’t versioned their schema’s safely (there was no mechanism for this apart from manually checking in PRs) you can get gibberish back for fields that should contain data. It’s basically just a binary blob with whatever schema the client has overlaid so debugging is an absolute pain. Unless you are Google scale use a text based format like JSON and save yourself a lot of hassle.

replies(3): >>42195995 #>>42196041 #>>42196375 #

3. discreteevent ◴[20 Nov 24 17:07 UTC] No.42195995[source]▶

>>42195747 #

- JSON doesn't have any schema checking either.

- You can encode the protocol buffers as JSON if you want a text based format.

4. jayd16 ◴[20 Nov 24 17:11 UTC] No.42196041[source]▶

>>42195747 #

You can trivially make breaking changes in a JSON blob too. GRPC has well documented ways to make non-breaking changes. If you're working somewhere where breaking schema changes go in with little fanfare and much debugging then I'm not sure JSON will save you.

The only way to know is to dig through CLs? Write a test.

There's also automated tooling to compare protobuff schemas for breaking changes.

replies(1): >>42196439 #

5. ryanisnan ◴[20 Nov 24 17:28 UTC] No.42196231[source]▶

>>42195597 (TP) #

Can you say more about what the pain points are?

6. jeffbee ◴[20 Nov 24 17:42 UTC] No.42196375[source]▶

>>42195747 #

There is an art to having forwards and backwards compatible RPC schemas. It is easy, but it is surprisingly difficult to get people to follow easy rules. The rules are as follows:

  1) Never change the type of a field
  2) Never change the semantic meaning of a field
  3) If you need a different type or semantics, add a new field

Pretty simple if you ask me.

replies(1): >>42196428 #

7. andy_ppp ◴[20 Nov 24 17:47 UTC] No.42196428{3}[source]▶

>>42196375 #

If I got to choose my colleagues this would be fine, unfortunately I had people who couldn’t understand eventual consistency. One of the guys writing Go admitted he didn’t understand what a pointer was etc. etc.

replies(1): >>42197914 #

8. andy_ppp ◴[20 Nov 24 17:49 UTC] No.42196439{3}[source]▶

>>42196041 #

JSON contains a description of the structure of the data that is readable by both machines and humans. JSON can certainly go wrong but it’s much simpler to see when it has because of this. GRPC is usually a binary black box that adds loads of developer time to upskill, debug, figure out error cases and introduces whole new classes of potential bugs.

If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.

replies(1): >>42197049 #

9. cherryteastain ◴[20 Nov 24 18:04 UTC] No.42196568[source]▶

>>42195597 (TP) #

C++ generated code from protobuf/grpc is pretty awful in my experience.

replies(1): >>42196915 #

10. seanw444 ◴[20 Nov 24 18:37 UTC] No.42196845[source]▶

>>42195597 (TP) #

I'm using it for a small-to-medium sized project, and the generated files aren't too bad to work with at that scale. The actual generation of the files is very awful for Python specifically, though, and I've had to write a script to bandaid fix them after they're generated. An issue has been open for this for years on the protobuf compiler repo, and it's basically a "wontfix" as Google doesn't need it fixed for their internal use. Which is... fine I guess.

The Go part I'm building has been much more solid in contrast.

replies(1): >>42197996 #

11. bluGill ◴[20 Nov 24 18:45 UTC] No.42196915[source]▶

>>42196568 #

Do you need to look at that generated code though? I haven't used gRPC yet (some poor historical decisions mean I can't use it in my production code so I'm not in a hurry - architecture is rethinking those decisions in hopes that we can start using it so ask me in 5 years what I think). My experience with other generated code is that it is not readable but you never read it so who cares - instead you just trust the interface which is easy enough (or is terrible and not fixable)

replies(1): >>42197608 #

12. eqvinox ◴[20 Nov 24 18:58 UTC] No.42197041[source]▶

>>42195597 (TP) #

It's equally painful in C, you have to wrap the C++ library :(

13. aseipp ◴[20 Nov 24 18:59 UTC] No.42197049{4}[source]▶

>>42196439 #

> JSON contains a description of the structure of the data that is readable by both machines and humans.

No, it by definition does not, because JSON has no schema. Only your application contains and knows the (expected) structure of the data, but you literally cannot know what structure any random blob of JSON objects will have without a separate schema. When you read a random /docs page telling you "the structure of the resulting JSON object from this request is ...", that's just a schema but written in English instead of code. This has big downstream ramifications.

For example, many APIs make the mistake of parsing JSON and only returning some opaque "Object" type, which you then have to map onto your own domain objects, meaning you actually parse every JSON object twice: once into the opaque structure, and once into your actual application type. This has major efficiency ramifications when you are actually dealing with a lot of JSON. The only way to do better than this is to have a schema in some form -- any form at all, even English prose -- so you can go from the JSON text representation directly into your domain type at parse-time. This is part of the reason why so many JSON libraries in every language tend to have some high level way of declaring a JSON object in the host language, typically as some kind of 'struct' or enum, so that they can automatically derive an actually efficient parsing step and skip intermediate objects. There's just no way around it. JSON doesn't have any schema, and that's part of its appeal, but in practice one always exists somewhere.

You can use protobuf in text-based form too, but from what you said, you're probably screwed anyway if your coworkers are just churning stuff and changing the values of fields and stuff randomly. They're going to change the meaning of JSON fields willy nilly too and there will be nothing to stop you from landing back in step 1.

I will say that the quality of gRPC integrations tends to vary wildly based on language though, which adds debt, you're definitely right about that.

replies(2): >>42197677 #>>42202070 #

14. cherryteastain ◴[20 Nov 24 20:06 UTC] No.42197608{3}[source]▶

>>42196915 #

I meant the interfaces are horrible. As you said, as long as it has a good interface and good performance, I wouldn't mind.

For example, here's the official tutorial for using the async callback interfaces in gRPC: https://grpc.io/docs/languages/cpp/callback/

It encourages you to write code with practices that are quite universally considered bad in modern C++ due to a very high chance of introducing memory bugs, such as allocating objects with new and expecting them to clean themselves up via delete this;. Idiomatic modern C++ would be using smart pointers, or go a completely different route with co-routines and no heap-allocated objects.

replies(1): >>42197686 #

15. andy_ppp ◴[20 Nov 24 20:15 UTC] No.42197677{5}[source]▶

>>42197049 #

If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure? If I give you a GRPC binary you need the separate schema and tools to be able to comprehend it. That’s all I’m saying is the separation of the schema from some minimal structure makes the debugging of services more difficult. I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Maybe the tools are fantastic not but I still think being able to debug messages without them is an advantage in almost all systems, you probably don’t need the level of performance GRPC provides.

If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

replies(1): >>42198535 #

16. bluGill ◴[20 Nov 24 20:16 UTC] No.42197686{4}[source]▶

>>42197608 #

ouch. I'm temped to look up the protocol and build my own grpc implementation from scratch. Generated code isn't that hard to create, but it is something I'd expect to be able to just use someone else's version instead of writing (and supporting) myself.

17. lrem ◴[20 Nov 24 20:43 UTC] No.42197914{4}[source]▶

>>42196428 #

How does JSON protect you from that?

replies(1): >>42198466 #

18. lima ◴[20 Nov 24 20:53 UTC] No.42197996[source]▶

>>42196845 #

I guess you're talking about the relative vs. absolute import paths?

This solves it: https://github.com/cpcloud/protoletariat

replies(1): >>42198057 #

19. seanw444 ◴[20 Nov 24 20:59 UTC] No.42198057{3}[source]▶

>>42197996 #

Yeah, my script works perfectly fine though, without pulling in another dependency. The point is that this shouldn't be necessary. It feels wrong.

20. andy_ppp ◴[20 Nov 24 21:48 UTC] No.42198466{5}[source]▶

>>42197914 #

People understand JSON fairly commonly as they can see what is happening in a browser or any other system - what is the equivalent for GRPC if I want to do console.log(json)?

GRPC for most people is a completely black box with unclear error conditions that are not as clear to me at least. For example what happens if I have an old schema and I'm not seeing a field, there's loads of things that can be wrong - old services, old client, even messages not being routed correctly due to networking settings in docker or k8s.

Are you denying there is absolutely tones to learn here and it is trickier to debug and maintain?

replies(1): >>42227849 #

21. aseipp ◴[20 Nov 24 21:56 UTC] No.42198535{6}[source]▶

>>42197677 #

> If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure?

That's too easy. What if I give you a 200KiB JSON object with 40+ nested fields that's whitespace stripped and has base64 encoded values? Its "structure" is a red herring. It is not a matter of text or binary. The net result is I still have to use a tool to inspect it, even if that's only something like gron/jq in order to make it actually human readable. But at the end of the day the structure is a concern of the application, I have to evaluate its structure in the context of that application. I don't just look at JSON objects for fun. I do it mostly to debug stuff. I still need the schematic structure of the object to even know what I need to write.

FWIW, I normally use something like grpcurl in order to do curl-like requests/responses to a gRPC endpoint and you can even have it give you the schema for a given service. This has worked quite well IME for almost all my needs, but I accept with this stuff you often have lots of "one-off" cases that you have to cobble stuff together or just get dirty with printf'ing somewhere inside your middleware, etc.

> I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Yes, I totally am with you on this. Many of the implementations just totally suck and JSON is common enough nowadays that you kind of have to at least have something that doesn't completely fall over, if you want to be taken remotely seriously. It's hard to write a good JSON library, but it's definitely harder to write a good full gRPC stack. I 100% have your back on this. I would probably dislike gRPC even more but I'm lucky enough to use it with a "good" toolkit (Rust/Prost.)

> If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

I mean, if your entire complaint is about text vs binary, not efficiency or correctness, JSON Protobuf seems like it fits your needs. You still get the other benefits of gRPC you'd have anywhere (an honest-to-god schema, better transport efficiency over mandated HTTP/2, some amount of schema-generic middleware, first-class streaming, etc etc.)

FWIW, I don't particularly love gRPC. And while I admit I loathe JSON, I'm mainly pushing back on the notion that JSON has some "schema" or structure. No, it doesn't! Your application has and knows structure. A JSON object is just a big bag of stuff. For all its failings, gRPC having a schema is a matter of it actually putting the correct foot first and admitting that your schema is real, it exists, and most importantly can be written down precisely and checked by tools!

22. imtringued ◴[21 Nov 24 07:52 UTC] No.42202070{5}[source]▶

>>42197049 #

Here are some sad news for you: The flexibility of JSON and CBOR cannot be matched by any schema based system, because it is equivalent to giving up that advantage.

Sure, the removal of a field can cause an application level error, but that is probably the most benign form of failure there is. What's worse is when no error occurs and the data is simply reinterpreted to fit the schema. Then your database will slowly fill up with corrupted garbage data and you'll have to restore from a backup.

What you have essentially accomplished in your response is to miss the entire point.

There are also other problems with protobuf in the sense that the savings aren't actually as big as you'd expect. E.g. there is still costly parsing, the data transmitted over the wire isn't significantly smaller unless you have data that is a poor fit for JSON.

replies(2): >>42202160 #>>42204164 #

23. ◴[21 Nov 24 08:11 UTC] No.42202160{6}[source]▶

>>42202070 #

24. jonathanberi ◴[21 Nov 24 13:34 UTC] No.42204164{6}[source]▶

>>42202070 #

It's also worth noting CDDL [1], which adds schema-like utility to CBOR (and technically JSON.) We've started to use it in more places where we use CBOR.

[1] https://datatracker.ietf.org/doc/rfc8610/

25. lrem ◴[24 Nov 24 13:31 UTC] No.42227849{6}[source]▶

>>42198466 #

I'd go with `cerr << request.DebugString() << '...' << response.DebugString();`, preferably with your favourite logger instead of `stdout`. My browser does the equivalent for me just fine, but that required an extension.

I buy the familiarity argument, but I usually don't see the wire format at all. And maintenance-wise, protobufs seem easier to me. But that's because, e.g., someone set up a presubmit for me that yells at me if my change isn't backwards compatible. That's kind of hard to do if you don't have a formal specification of what goes into your protocol.