Using Erlang hot code updates

(underjord.io)

268 points lawik | 3 comments | 19 Nov 24 20:29 UTC | HN request time: 0.699s | source

Show context

jhgg ◴[19 Nov 24 23:43 UTC] No.42189283[source]▶

When I worked at Discord, we used BEAM hot code loading pretty extensively, built a bunch of tooling around it to apply and track hot-patches to nodes (which in turn could update the code on >100M processes in the system.) It allowed us to deploy hot-fixes in minutes (full tilt deploy could complete in a matter of seconds) to our stateful real-time system, rather than the usual ~hour long deploy cycle. We generally only used it for "emergency" updates though.

The tooling would let us patch multiple modules at a time, which basically wrapped `:rpc.call/4` and `Code.eval_string/1` to propagate the update across the cluster, which is to say, the hot-patch was entirely deployed over erlang's built-in distribution.

replies(2): >>42189462 #>>42191479 #

stouset ◴[20 Nov 24 07:17 UTC] No.42191479[source]▶

>>42189283 #

Can someone explain how this is not genuinely terrifying from a security perspective?

replies(3): >>42191535 #>>42191565 #>>42192955 #

nelsonic ◴[20 Nov 24 07:26 UTC] No.42191535[source]▶

>>42191479 #

Where is the security problem? All code commits and builds can still be signed. All of this is just a more efficient way of deploying changes without dropping existing connections.

Are you suggesting that hot code replacement is somehow a attack vector? Ericsson has been using this method for decades on critical infrastructure to patch switches without dropping live calls/connections it works.

No need to fear Erlang/BEAM.

replies(1): >>42191567 #

stouset ◴[20 Nov 24 07:33 UTC] No.42191567[source]▶

>>42191535 #

My interpretation of the GP was that a code change in one node can be automagically propagated out to a cluster of participating Erlang nodes.

As a security person, this seems inherently dangerous. I asked why it is safe, because I presumed I’m missing something due to the lack of ever hearing about exploitation in the wild.

replies(2): >>42192109 #>>42197220 #

1. badpenny ◴[20 Nov 24 09:18 UTC] No.42192109[source]▶

>>42191567 #

Why is it any more dangerous than a conventional update, which also needs to be propagated?

replies(1): >>42195485 #

2. stouset ◴[20 Nov 24 16:25 UTC] No.42195485[source]▶

>>42192109 (TP) #

A conventional update takes place out of band.

If someone were to exploit a running Erlang process, the description of this feature sounds to me like they would have access to code paths that allow pushing new code to other Erlang processes on cooperating nodes.

replies(1): >>42200275 #

3. vermilingua ◴[21 Nov 24 01:56 UTC] No.42200275[source]▶

>>42195485 #

Yes, but if they can exploit one process they can exploit any of the other nodes anyway, so there's nothing to be gained but a bit of convenience.

↑