Most active commenters
  • monocasa(4)
  • dizhn(3)
  • bonzini(3)

←back to thread

QEMU 10.1.0

(wiki.qemu.org)
302 points dmitrijbelikov | 33 comments | | HN request time: 0.635s | source | bottom
1. dijit ◴[] No.45038037[source]
QEMU is truly excellent software, from the perspective of a person who very rarely needs to emulate another architecture. It "just works" and has wonderful integrations with basically everything I could want.. sometimes it feels like magic: even if the commandline UX is a bit weird in places.

I've always wondered though how it works with KVM: I know KVM is a virtualisation accelerator that enables passing through native code to the CPU somehow; but it feels like QEMU/KVM basically runs the internet now. Almost the entire modern cloud is built on QEMU and KVM as a hypervisor (right?) but I feel like I'm missing a lot about how it's working.

I also wonder if this steals huge amounts of resources away from emulation, or does it end up helping out. Because to say the modern internet is largely running on QEMU is likely a massive understatement.

replies(8): >>45038105 #>>45038111 #>>45038113 #>>45038185 #>>45038444 #>>45038616 #>>45038965 #>>45038990 #
2. lathiat ◴[] No.45038105[source]
Not everything uses qemu. Some do. More use KVM. Not everything does.

Example: https://firecracker-microvm.github.io/

replies(1): >>45045532 #
3. jamesy0ung ◴[] No.45038111[source]
Yeah I also found myself curious as to how KVM actually works, I found these helpful

https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.... http://www.haifux.org/lectures/312/High-Level%20Introduction... https://zserge.com/posts/kvm/

replies(2): >>45038145 #>>45039323 #
4. pm215 ◴[] No.45038113[source]
Resources-wise there's not really any "stealing" going on. The people/companies who care about KVM and the virtualization use cases work on that, and the people/companies who care about emulation work on those parts. If QEMU didn't support virtualization then it's not like the people currently working on QEMU virtualization would shift over to emulation support: they'd be working on some other project instead to achieve their VM goals.
5. dijit ◴[] No.45038145[source]
Awesome, thanks for the entrypoint!
6. monocasa ◴[] No.45038185[source]
KVM is basically three components.

* An abstraction over second level page tables to map some of a host user process as what the guest thinks of as physical memory.

* An abstraction to jump into the context that uses those page tables, and traps back out in the case of anything that the hardware would normally handle, but the hypervisor wants to handle manually instead.

* A collection of mechanisms to handle some of those traps in kernel space to avoid having to context switch back out to the host user process if the kind of trap is common enough, both in the sense of the trap itself happens often enough to show up on perf graphs, as well as the abstraction being exercised is relatively standard (think interrupt controllers and timers).

Let me know if you have any other questions.

replies(3): >>45038711 #>>45042005 #>>45042709 #
7. dizhn ◴[] No.45038444[source]
qemu/kvm in enabling the cloud is huge but that's not the only place it really makes a tremendous difference. One example where it's essential is new OS development. They all basically first target the qemu machine with its virtual hardware. It makes development much faster compared to running on real hardawre while easily enabling debug output without needing cables and the like.
replies(1): >>45052040 #
8. hnlmorg ◴[] No.45038616[source]
Xen is still used massively too.
replies(1): >>45045840 #
9. eddd-ddde ◴[] No.45038711[source]
Where could someone get started in terms of reading material to learn more about this in depth?
replies(3): >>45039162 #>>45041157 #>>45043593 #
10. teekert ◴[] No.45038965[source]
If you use it rarely, I can high recommend the excellent QuickEMU [0]

Any VM is just a `quickget ubuntu 24.04` and `quickemu --vm ubuntu-24.04.conf` away. The conf file is just a yaml that is very readable and can give you more cores/ram/disk easily. Just run `quickget` to get a list of OS's to download.

[0] https://github.com/quickemu-project/quickemu

replies(1): >>45048261 #
11. stinkbeetle ◴[] No.45038990[source]
> I've always wondered though how it works with KVM

Other people have given some more comprehensive explanations, but I'll try to put it as simply as possible.

Plain QEMU has a CPU emulation layer called TCG. The machine basically consists of memory (RAM and MMIO devices) and CPUs (CPU registers and state). When QEMU has set up the machine and is ready to run, it calls TCG to say "given this memory and this initial CPU register state, start running instructions". When you use QEMU with KVM, the TCG emulation layer is swapped out with KVM and it asks KVM to start running instructions. That's it. KVM exposes APIs that caller can specify guest memory and initial CPU register state, and a call to run that CPU with that memory.

Going a bit further, the hardware virtualization functions that KVM uses have the ability to map that memory with a second level of translation which lets KVM present it to the guest at the locations it expects, and to prevent the guest from accessing any memory that it should not. The hardware also has the ability to run the CPU in a mode where it has the normal set of registers (which is what QEMU wants), but it maintains some additional hypervisor control registers not available to the guest, and those can ensure the guest can't take complete control of the CPU (for example, the guest OS can "disable interrupts" with the usual MSR or similar bit and that does prevent the guest from getting interrupts, but that it does not disable hypervisor directed interrupts, so the hypervisor can always take back control of the CPU with a hypervisor-IPI or hypervisor timer interrupt).

Further still: when running in plain QEMU mode, devices are emulated by registering MMIO ranges in the memory address space and emulated loads and stores have code to detect these regions and instead of performing a simple load or store, they call into device model code which handles it accordingly. When you plug KVM in, you can still use these emulated devices. These are modeled by using that second level page table to put "not-valid" mappings in those MMIO ranges. These cause the CPU to trigger a page fault when it tries to access them, and KVM sees this, looks up the table of memory registered by QEMU, and sees that it is an address which QEMU wants to handle, so it returns from the KVM_RUN system call with result code that indicates there was an MMIO read/write that needs to be handled. QEMU then directs this into its emulated device model. Then when QEMU has performed that device emulation, it calls back into KVM to continue running the CPU.

It's all pretty clever. The really astounding thing is that most of the basic concepts for all this stuff were developed/discovered/invented like 50+ years ago.

replies(2): >>45045173 #>>45052828 #
12. dysoco ◴[] No.45039162{3}[source]
I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
replies(2): >>45039342 #>>45042538 #
13. penguin_booze ◴[] No.45039323[source]
Excellent. I haven't gone through them yet, but if you've any similar pointers for QEMU, please share.

My rough understanding is that it's the user-space emulation part of a virtualization solution. I.e., when the kernel traps the virtualized process, saying 'nope, you can't do that here', the control falls back to user space handler in QEMU saying, 'hey, the kernel said I can't do that there; can you sort this out?'. And this back-and-forth games keeps happening during the lifetime of the virtualized process.

14. znpy ◴[] No.45039342{4}[source]
> I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.

I can vouch for this. I'm no virtualization expert but I did stumble upon some intel developers manuals (truthfully, i fell into the rabbit hole) and just skimming it made everything make much more sense.

For example: https://www.intel.com/content/dam/www/public/us/en/documents... - "CHAPTER 23 INTRODUCTION TO VIRTUAL MACHINE EXTENSIONS"

The link above explains how the VMX extension work on intel processors. Any software doing hardware-assisted virtualization (so no binary translation, no full-system-emulation) will likely be using those instructions.

15. yjftsjthsd-h ◴[] No.45041157{3}[source]
From a different direction, I'd suggest https://www.devever.net/~hl/kvm
16. privatelypublic ◴[] No.45042005[source]
I thought part of vt-d/vt-x made the "virtual tables" actual tables.

Eg- the memory the VM can access is controlled by the MMU of the CPU (below ring0/kernel). Resulting in the only VM escapes being the Shim(s) for talking with the host (network, memory balloon, graphics).

replies(1): >>45043670 #
17. jlokier ◴[] No.45042538{4}[source]
The AMD Processor Programming Reference manuals are also good for this, if you like complete and detailed. They complement the Intel manuals. Much the material is duplicate because the processors are so similar, but written in a different way.
18. accelbred ◴[] No.45042709[source]
How does nested KVM work? Are all the page tables handled by the top level? Do the traps have to propagate up?
replies(1): >>45043681 #
19. billywhizz ◴[] No.45043593{3}[source]
if you want to look at existing implementations on top of kvm then these might be useful - rust-vmm is a core library for AWS' firecracker vmm.

https://github.com/rust-vmm/kvm https://github.com/kvmtool/kvmtool https://github.com/sysprog21/kvm-host

20. bonzini ◴[] No.45043670{3}[source]
Yes, there are virtualization-specific page tables that convert guest physical to host physical addresses. KVM still haw to take host userspace's virtual addresses, convert them to host physical addresses, and make sure that the virtualization-specific page tables stay in sync with the kernel's usual page tables (which convery host virtual addresses to host physical)
21. bonzini ◴[] No.45043681{3}[source]
Yes, the top level uses write protection of guest memory to combine the two levels of translation into one.
22. newlisp ◴[] No.45045173[source]
The only comment that directly answers the original doubt about how QEMU can use and work with KVM. Hats off.
23. xyse53 ◴[] No.45045532[source]
I've found QEMUs microvm to be faster at boot while having nicer tooling and a cleaner upgrade path if needing more features. Aside from hype I'm actually not sure why anyone would still use firecracker.
replies(1): >>45052059 #
24. bonzini ◴[] No.45045840[source]
AWS runs Xen guests through a KVM-based compatibility layer. You can try it with QEMU too.
25. mrheosuper ◴[] No.45048261[source]
Doesn't the 'q' in 'qemu' stand for "Quick" ?
replies(1): >>45049400 #
26. teekert ◴[] No.45049400{3}[source]
Looks like it does :) Maybe take it up with Martin Wimpress, who probably meant it as a way to jar us all just a tad (knowing him from podcasts he probably has a witty and funny response to such inquiries).
27. monocasa ◴[] No.45052040[source]
Eh, we just used stuff like bochs and vmware prior.
replies(1): >>45052434 #
28. monocasa ◴[] No.45052059{3}[source]
Mainly because of the much larger attack surface of QEMU.
replies(1): >>45060413 #
29. dizhn ◴[] No.45052434{3}[source]
I didn't mean qemu is the only option.
replies(1): >>45052632 #
30. monocasa ◴[] No.45052632{4}[source]
My point is that the appearance qemu/kvm didn't really practically change the space much.
replies(1): >>45052772 #
31. dizhn ◴[] No.45052772{5}[source]
Oh I understand what you're saying. You're probably right. Collectively virtualization allows a lot but qemu might not be as exclusive as I said. (I think there are only a few years between bocsh/vmware and qemu/xen.)

EDIT: I didn't mean to sound like ChatGPT. It happened naturally :)

32. egberts1 ◴[] No.45052828[source]
And there are other emulated accelerators that QEMU leverages:

https://wiki.gentoo.org/wiki/QEMU#Introduction

33. xyse53 ◴[] No.45060413{4}[source]
I can't quantify how much of that surface is also reduced with the microvm machine vs other parts of QEMU vs Firecracker... But fair enough point.