Ergo: Erlang-inspired event driven actor framework in Go

(github.com)

175 points nateb2022 | 4 comments | 12 Sep 24 11:06 UTC | HN request time: 0.865s | source

Show context

nahuel0x ◴[12 Sep 24 16:56 UTC] No.41522944[source]▶

>>41519471 (OP) #

Three big differences in comparison with Erlang: 1- Cannot externally kill a process (yes, ergo process have a Kill method but the process will be in a "zombie" state until the current message handlers returns... maybe stuck forever) 2- No hot code reloading. 3- No per-process GC.

replies(4): >>41523113 #>>41523543 #>>41524544 #>>41525115 #

throwaway894345 ◴[12 Sep 24 17:50 UTC] No.41523543[source]▶

>>41522944 #

I've never written any Erlang before--why do I care about per-process GC?

replies(5): >>41523595 #>>41523906 #>>41523962 #>>41524224 #>>41527473 #

davisp ◴[12 Sep 24 18:27 UTC] No.41523962[source]▶

>>41523543 #

Also, for anyone not completely familiar with Erlang's terminology, the translation of "per process garbage collection" to Go would be "per goroutine garbage collection". As mentioned in a sibling comment, this allows Erlang style garbage collection to avoid pausing the entire operating system process when running garbage collectin.

replies(1): >>41524107 #

whizzter ◴[12 Sep 24 18:41 UTC] No.41524107[source]▶

>>41523962 #

Per-process GC is an optimization similar to nurseries in regular collectors, esp any object that has been sent in a message must be visible globally (yes there could be small object optimizations but that would increase sender complexity).

Also an overlooked part here is that the global Erlang GC is easier to parallellize and/or keep incremental since it won't have object cycles sans PID's (that probably have special handling anyhow).

TlDr; GC's become way harder as soon as you have cyclic objects, Erlang avoids it and thus parts of it being good is more about Erlang being "simple".

replies(3): >>41524336 #>>41524534 #>>41524592 #

neonsunset ◴[12 Sep 24 19:30 UTC] No.41524592[source]▶

>>41524107 #

> GC's become way harder as soon as you have cyclic objects

This may be true only for some implementations. Good GC implementations operate on the concept of object graph roots. Whether the graph has cyclic references or not is irrelevant as the GC scans the relevant memory linearly. As long as the graph is unrooted, such GC implementations are able to still easily collect it (or, to be more precise, ignore it - the generational moving GCs the cost is the live objects that need to be relocated to an older/tenured generation).

replies(1): >>41530012 #

1. whizzter ◴[13 Sep 24 11:04 UTC] No.41530012[source]▶

>>41524592 #

I'd like to see a reference to some GC actually doing explicit optimizations of this kind in a multithreaded scenario (not just as an indirect effect of theuir regular scanning), more or less linear scanning of memory is a natural consequence of a moving GC.

The Java gc's are doing some crazy stuff, my point however was that the acyclic nature of the Erlang object graph enables them to do fairly "simple" optimizations to the GC that in practice should remove most need for pauses without hardware or otherwise expensive read barriers.

It doesn't have to do a lot of things to be good, once you have cycles you need a lot more machinery to be able to do the same things.

replies(1): >>41534202 #

2. neonsunset ◴[13 Sep 24 19:15 UTC] No.41534202[source]▶

>>41530012 (TP) #

Personally, I'm disappointed that there is too much superstitions and assumptions about GC designs going around, which lead to the discussion like this one that treats specialized designs with explicit tradeoffs, which both Go and BEAM are, as universally superior options. Or discussions that don't recognize that GC is in many scenarios an optimization over malloc/free and a significant complexity reduction over explicit management of arenas.

When it comes to Java - it has multiple GC implementations with different tradeoffs and degree of sophistication. I'm not very well versed in their details besides the fact that pretty much all of them are quite liberal with the use of host memory. So the way I approach it is by assuming that at least some of them resemble the GC implementation in .NET, given extensive evidence that under allocation-heavy scenarios they have similar (throughput) performance characteristics.

As for .NET itself, in server scenarios, it uses SRV GC which has per-core heaps (the count is sizing is now dynamically scalable per workload profile, leading to much smaller RAM footprint) and multi-threaded collection, which lends itself to very high throughput and linear scaling with cores even on very large hosts thanks to minimal contention (think 128C 1TiB RAM, stometimes you need to massage it with flags for this, but it's nowhere near the amount of ceremony required by Java).

Both SRV and WKS GC implementations use background collection for Gen2, large and pinned object heaps. Collection of Gen0 and Gen1 is pausing by design as it lends itself for much better throughput and pause times are short enough anyway.

They are short enough that modern .NET versions end up having better p99 latency than Go on multi-core throughput saturated nodes. Given decent enough codebase, you only ever need to worry about GC pause impact once you go into the territory of systems with hard realtime requirements. One of the better practical examples of this that exists in open source is Osu! which must run its game loop 1000hz - only 1ms of budget! This does pose challenges and requires much more hands-on interaction with GC like dynamically switching GC behaviopr depending on scenario: https://github.com/dotnet/runtime/issues/96213#issuecomment-... This, however, would be true with any language with automatic memory management, if it's possible to implement such a system in it in the first place.

replies(1): >>41556604 #

3. whizzter ◴[16 Sep 24 14:42 UTC] No.41556604[source]▶

>>41534202 #

I'm not gonna say it with 100% certainty, but having started using C# a few years ago i think that much of the smaller latency is simply because C# probably produces a magnitude less garbage than Java in practice (probably less than Go as well). The C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc collector(that includes software based read-barriers whilst most other collectors only use write-barriers).

Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.

But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.

replies(1): >>41572380 #

4. neonsunset ◴[17 Sep 24 20:29 UTC] No.41572380{3}[source]▶

>>41556604 #

When you say "C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc", what do you have in mind? I'm skimming through G1 description once again and it looks quite similar to .NET's GC implementation. As for Zgc, introducing read barriers is a very strong no in .NET because it introduces additional performance overhead to all the paths that were previously free.

On allocation traffic, I doubt average code in C# allocates less than Go - the latter puts quite a lot of emphasis on plain structs, and because Go has very poor GC throughput, the only way to explain tolerable performance in the common case is that Go still allocates less. Of course this will change now that more teams adopt Go and start classic interface spam and write abstractions that box structs into interfaces to cope with inexpressive and repetition-heavy nature of Go.

Otherwise, both .NET and Java GC implementations are throughput-focused, even the ones that target few-core smaller applications, while Go GC focuses on low to moderate allocation traffic on smaller hosts with consistent performance, and regresses severely when its capacity to reclaim memory in time is exceeded. You can expect from ~4 up to ~16-32x and more (SRV GC scales linearly with cores) difference in maximum allocation throughput between Go and .NET: https://gist.github.com/neon-sunset/c6c35230e75c89a8f6592cac...

↑