Problems with Go channels (2016)

(www.jtolio.com)

160 points mpweiher | 2 comments | 13 Apr 25 05:43 UTC | HN request time: 0.426s | source

Show context

t8sr ◴[13 Apr 25 11:15 UTC] No.43671930[source]▶

When I did my 20% on Go at Google, about 10 years ago, we already had a semi-formal rule that channels must not appear in exported function signatures. It turns out that using CSP in any large, complex codebase is asking for trouble, and that this is true even about projects where members of the core Go team did the CSP.

If you take enough steps back and really think about it, the only synchronization primitive that exists is a futex (and maybe atomics). Everything else is an abstraction of some kind. If you're really determined, you can build anything out of anything. That doesn't mean it's always a good idea.

Looking back, I'd say channels are far superior to condition variables as a synchronized cross-thread communication mechanism - when I use them these days, it's mostly for that. Locks (mutexes) are really performant and easy to understand and generally better for mutual exclusion. (It's in the name!)

replies(5): >>43672034 #>>43672125 #>>43672192 #>>43672501 #>>43687905 #

dfawcus ◴[13 Apr 25 12:01 UTC] No.43672125[source]▶

>>43671930 #

How large do you deem to be large in this context?

I had success in using a CSP style, with channels in many function signatures in a ~25k line codebase.

It had ~15 major types of process, probably about 30 fixed instances overall in a fixed graph, plus a dynamic sub-graph of around 5 processes per 'requested action'. So those sub-graph elements were the only parts which had to deal with tear-down, and clean up.

There were then additionally some minor types of 'process' (i.e. goroutines) within many of those major types, but they were easier to reason about as they only communicated with that major element.

Multiple requested actions could be present, so there could be multiple sets of those 5 process groups connected, but they had a maximum lifetime of a few minutes.

I only ended up using explicit mutexes in two of the major types of process. Where they happened to make most sense, and hence reduced system complexity. There were about 45 instances of the 'go' keyword.

(Updated numbers, as I'd initially misremembered/miscounted the number of major processes)

replies(1): >>43674013 #

1. hedora ◴[13 Apr 25 16:38 UTC] No.43674013[source]▶

>>43672125 #

How many developers did that scale to? Code bases that I’ve seen that are written in that style are completely illegible. Once the structure of the 30 node graph falls out of the last developer’s head, it’s basically game over.

To debug stuff by reading the code, each message ends up having 30 potential destinations.

If a request involves N sequential calls, the control flow can be as bad as 30^N paths. Reading the bodies of the methods that are invoked generally doesn’t tell you which of those paths are wired up.

In some real world code I have seen, a complicated thing wires up the control flow, so recovering the graph from the source code is equivalent to the halting problem.

None of these problems apply to async/await because the compiler can statically figure out what’s being invoked, and IDE’s are generally as good at figuring that out as the compiler.

replies(1): >>43674250 #

2. dfawcus ◴[13 Apr 25 17:11 UTC] No.43674250[source]▶

>>43674013 (TP) #

That was two main developers, one doing most of the code and design, the other a largely closed subset of 3 or 4 nodes. Plus three other developers co-opted for implementing some of the nodes. [1]

The problem space itself could have probably grown to twice the number of lines of code, but there wouldn't have needed to be any more developers. Possibly only the original two. The others were only added for meeting deadlines.

As to the graph, it was fixed, but not a full mesh. A set of pipelines, with no power of N issue, as the collection of places things could talk to was fixed.

A simple diagram represented the major message flow between those 30 nodes.

Testing of each node was able to be performed in isolation, so UT of each node covered most of the behaviour. The bugs were three deadlocks, one between two major nodes, one with one major node.

The logging around the trigger for the deadlock allowed the cause to be determined and fixed. The bugs arose due to time constraints having prevented an analysis of the message flows to detect the loops/locks.

So for most messages, there were a limited number of destinations, mostly two, for some 5.

For a given "request", the flow of messages to the end of the fixed graph would be passing through 3 major nodes. That then spawned the creation of the dynamic graph, with it having two major flows. One a control flow through another 3, the other a data flow through a different 3.

Within that dynamic graph there was a richer flow of messages, but the external flow from it simply had the two major paths.

Yes, reading the bodies of the methods does not inform as to the flows. One either had to read the "main" routine which built the graph, or better refer to the graph diagram and message flows in the design document.

Essentially a similar problem to dealing with "microservices", or plugable call-backs, where the structure can not easily be determined from the code alone. This is where design documentation is necessary.

However I found it easier to comprehend and work with / debug due to each node being a prodable "black box", plus having the graph of connections and message flows.

[1] Of those, only the first had any exerience with CSP or Go. The CSP expereince being with a library for C, the Go experience some minimal use a year earlier. The other developers were all new to CSP and Go. The first two developers were "senior" / "experienced".

↑