←back to thread

267 points lawik | 4 comments | | HN request time: 0s | source
Show context
throwaway81523 ◴[] No.42189286[source]
You have to be very very very careful when preparing relups. The alternative on Linux is to launch an entire new server on the same machine, then transfer the session data and the open sockets to it through IPC. I once asked Joe Armstrong whether this was as good as relups and why Erlang went the relup route. I don't remember the exact words and don't want to misquote him, but he basically said it was fine, and Erlang went with relups and hot patching because transferring connections (I guess they would have been hardware interfaces rather than sockets) wasn't possible when they designed the hot patch system.

Hot patching is a bit unsatisfying because you are still running the same VM afterwards. WIth socket migration you can launch a new VM if you want to upgrade your Erlang version. I don't know of a way to do it with existing software, but in principle using something like HAProxy with suitable extensions, it should be possible to even migrate connections across machines.

replies(1): >>42189408 #
1. toast0 ◴[] No.42189408[source]
State migration is possible, and yeah, if you want to upgrade BEAM, state migration would be effective, whereas hot loading is not. If your VM gets pretty big, you might need to be careful about memory usage though, the donor VM is likely not going to shrink as fast as the heir VM grows. If you were so inclined, C does allow for hot loading too, but I think it'd be pretty hard to bend BEAM into something that you could hot load to upgrade.

Migrating socket state across machines is possible too, but I don't think it's anywhere close to mainstream. HAProxy is a lovely tool, but I'm pretty sure I saw something in its documentation that explicitly states that sort of thing is out of scope; they want to deal with user level sockets.

Linux has a TCP Repair feature which can be used as part of socket migration; but you'll also need to do something to forward packets to the new destination. Could be arping for the address from a new machine, or something fancier that can switch proportionally or ??? there's lots of options, depending on your network.

As much as I'd love to have a use case for TCP migration, it's a little bit too esoteric for me ... reconnecting is best avoided when possible, but I'm counting TCP migration as non-possible for purposes of the rule of thumb.

replies(1): >>42189536 #
2. throwaway81523 ◴[] No.42189536[source]
TCP migration on the same machine is real and it's not that big a deal, if that's what you meant by TCP migration. Doing it across machines is at best a theoretical possibility, I would agree. I have been wanting to look into CRIU more carefully, but I believe it uses TCP Repair that you mentioned. I'm unfamiliar with it though.

The saying in the Erlang crowd is that a non-distributed system can't be really reliable, since the power cord is a single point of failure. So a non-painful way to migrate across machines would be great. It just hasn't been important enough (I guess) for make anyone willing to deal with the technical obstacles.

I wonder whether other OS's have supported anything like that.

I worked on a phone switch (programmed in C) a long time ago that let you do both software and hardware upgrades (swap CPU boards etc.) while keeping connections intact, but the hardware was specially designed for that.

replies(2): >>42189661 #>>42198524 #
3. toast0 ◴[] No.42189661[source]
> I wonder whether other OS's have supported anything like that.

I don't think I've seen it, but I don't see everything, and it'd be pretty esoteric. From my memory of working with the FreeBSD tcp stack, I suspect it wouldn't be too hard to make something like this work there, too; other than the security aspects, but could probably do something like ok to 'repair' a connection that matches a listen socket you also pass or something. But you'd really need the use case to make the hassle worth it, and I don't think most regular server applications are enough to warrant it.

4. tguvot ◴[] No.42198524[source]
there were/are some linux patches/daemons to synchronize connection state/etc across servers