We clone a running VM in 2 seconds (2022)

1. mystraline ◴[11 Apr 25 14:37 UTC] No.43654294[source]▶

Different proposal:

Let's say we have 2 Linux machines. Identical hardware, identical libs.

I'd like to run a simple program on one machine, and then during mid-calculation, would like to transfer the running program to the other machine.

Is this doable?

replies(5): >>43654408 #>>43654455 #>>43654466 #>>43654749 #>>43655094 #

2. new_user_final ◴[11 Apr 25 14:47 UTC] No.43654408[source]▶

>>43654294 (TP) #

Unrelated, but somewhat similar in higher level, you can transfer state with durable execution, e.g temporal.io.

Instead of RAM, program's state saved in DB and execution environment resume in the previous state when restarted

replies(1): >>43654969 #

3. dilyevsky ◴[11 Apr 25 14:52 UTC] No.43654455[source]▶

>>43654294 (TP) #

Yes - using Criu[0] or docker checkpoint/restore api (which uses criu)

[0] -https://criu.org/Main_Page

4. panki27 ◴[11 Apr 25 14:52 UTC] No.43654466[source]▶

>>43654294 (TP) #

Interesting thought, but highly dependant on the actual program. Let's assume it doesn't touch any files on disk (no opening sockets either). You would need to at least

1. Halt the process (SIGSTOP comes to mind)

2. Create a copy of the running program and /proc/$pid - which will also include memory and mmap details

3. Transfer everything to the other machine

4. Load memory, somehow spawn a spawn a new process with the info from /proc/$pid we saved, mmap the loaded memory into it

5. Continue the process on the new machine (SIGCONT)

Let me admit that I do not have the slightest clue how to achieve step 4. I wonder if a systemd namespace could make things easier.

5. toast0 ◴[11 Apr 25 15:13 UTC] No.43654749[source]▶

>>43654294 (TP) #

A search for 'linux process live migration' picks up at least one repo that claims to have done it, and a bunch of other interesrting things.

For a very simple program, with limited I/O, it's not too hard; especially if you don't mind a significant pause to move. Difficulty comes when you have FDs to migrate and if you need to reduce the pausing. If you need to keep FDs to the filesystem or the program will load/store to the filesystem periodically, you'd need to do a filesystem migration too... If you need to keep FDs for network sockets, you've got to transfer those somehow.

If it's just stdin/out/err, you could probably do the migration in userspace with some difficulty if you need to keep pid constant (but maybe you don't need that either).

Minimal pausing involves letting the program run on the initial machine while you copy memory, setting pages to read-only so you can catch writes, and only pausing the program once the copy is substantially finished. Then you pause execution on the initial machine. If there's a significant amount of modified pages to copy over when you pause, you can still start execution on the new machine, as long as the modified pages are marked unavailable, if you background copy them before they're used great... if not, you have to block until the modified data comes through.

Probably you do this on two nearby machines with fast networking, and the program doesn't have a lot of writes all over memory, so the pause should be short.

replies(2): >>43655532 #>>43656452 #

6. WJW ◴[11 Apr 25 15:29 UTC] No.43654969[source]▶

>>43654408 #

How does such a method retain things like open network connections that have significant kernel state involved as well?

replies(1): >>43655518 #

7. tryauuum ◴[11 Apr 25 15:39 UTC] No.43655094[source]▶

>>43654294 (TP) #

if you put you program in a qemu/kvm VM then it just works

I was completely blown away when I first experienced it. My code running in a VM never even noticed any downtime. All the network connections are preserved and so on.

8. dilyevsky ◴[11 Apr 25 16:17 UTC] No.43655518{3}[source]▶

>>43654969 #

it does not. all the state that you need to make "durable" needs to be explicitly committed in temporal via their sdk

9. dilyevsky ◴[11 Apr 25 16:19 UTC] No.43655532[source]▶

>>43654749 #

If you're talking about Criu then it's not just a claim it actually does work well in production. I know Google was using it in prod on their internal systems and probably many others. It even can migrate TCP connections for you via socket repair api in Linux

10. wang_li ◴[11 Apr 25 17:47 UTC] No.43656452[source]▶

>>43654749 #

>...keep FDs for network sockets, you've got to transfer those somehow.

And if you have any shared memory segments, semaphores, or message queues, you have to drag along a bunch of other processes.