QEMU is truly excellent software, from the perspective of a person who very rarely needs to emulate another architecture. It "just works" and has wonderful integrations with basically everything I could want.. sometimes it feels like magic: even if the commandline UX is a bit weird in places.
I've always wondered though how it works with KVM: I know KVM is a virtualisation accelerator that enables passing through native code to the CPU somehow; but it feels like QEMU/KVM basically runs the internet now. Almost the entire modern cloud is built on QEMU and KVM as a hypervisor (right?) but I feel like I'm missing a lot about how it's working.
I also wonder if this steals huge amounts of resources away from emulation, or does it end up helping out. Because to say the modern internet is largely running on QEMU is likely a massive understatement.
* An abstraction over second level page tables to map some of a host user process as what the guest thinks of as physical memory.
* An abstraction to jump into the context that uses those page tables, and traps back out in the case of anything that the hardware would normally handle, but the hypervisor wants to handle manually instead.
* A collection of mechanisms to handle some of those traps in kernel space to avoid having to context switch back out to the host user process if the kind of trap is common enough, both in the sense of the trap itself happens often enough to show up on perf graphs, as well as the abstraction being exercised is relatively standard (think interrupt controllers and timers).
I thought part of vt-d/vt-x made the "virtual tables" actual tables.
Eg- the memory the VM can access is controlled by the MMU of the CPU (below ring0/kernel). Resulting in the only VM escapes being the Shim(s) for talking with the host (network, memory balloon, graphics).
I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
The AMD Processor Programming Reference manuals are also good for this, if you like complete and detailed. They complement the Intel manuals. Much the material is duplicate because the processors are so similar, but written in a different way.
> I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
I can vouch for this. I'm no virtualization expert but I did stumble upon some intel developers manuals (truthfully, i fell into the rabbit hole) and just skimming it made everything make much more sense.
The link above explains how the VMX extension work on intel processors. Any software doing hardware-assisted virtualization (so no binary translation, no full-system-emulation) will likely be using those instructions.
Excellent. I haven't gone through them yet, but if you've any similar pointers for QEMU, please share.
My rough understanding is that it's the user-space emulation part of a virtualization solution. I.e., when the kernel traps the virtualized process, saying 'nope, you can't do that here', the control falls back to user space handler in QEMU saying, 'hey, the kernel said I can't do that there; can you sort this out?'. And this back-and-forth games keeps happening during the lifetime of the virtualized process.
If you use it rarely, I can high recommend the excellent QuickEMU [0]
Any VM is just a `quickget ubuntu 24.04` and `quickemu --vm ubuntu-24.04.conf` away. The conf file is just a yaml that is very readable and can give you more cores/ram/disk easily. Just run `quickget` to get a list of OS's to download.
qemu/kvm in enabling the cloud is huge but that's not the only place it really makes a tremendous difference. One example where it's essential is new OS development. They all basically first target the qemu machine with its virtual hardware. It makes development much faster compared to running on real hardawre while easily enabling debug output without needing cables and the like.
> I've always wondered though how it works with KVM
Other people have given some more comprehensive explanations, but I'll try to put it as simply as possible.
Plain QEMU has a CPU emulation layer called TCG. The machine basically consists of memory (RAM and MMIO devices) and CPUs (CPU registers and state). When QEMU has set up the machine and is ready to run, it calls TCG to say "given this memory and this initial CPU register state, start running instructions". When you use QEMU with KVM, the TCG emulation layer is swapped out with KVM and it asks KVM to start running instructions. That's it. KVM exposes APIs that caller can specify guest memory and initial CPU register state, and a call to run that CPU with that memory.
Going a bit further, the hardware virtualization functions that KVM uses have the ability to map that memory with a second level of translation which lets KVM present it to the guest at the locations it expects, and to prevent the guest from accessing any memory that it should not. The hardware also has the ability to run the CPU in a mode where it has the normal set of registers (which is what QEMU wants), but it maintains some additional hypervisor control registers not available to the guest, and those can ensure the guest can't take complete control of the CPU (for example, the guest OS can "disable interrupts" with the usual MSR or similar bit and that does prevent the guest from getting interrupts, but that it does not disable hypervisor directed interrupts, so the hypervisor can always take back control of the CPU with a hypervisor-IPI or hypervisor timer interrupt).
Further still: when running in plain QEMU mode, devices are emulated by registering MMIO ranges in the memory address space and emulated loads and stores have code to detect these regions and instead of performing a simple load or store, they call into device model code which handles it accordingly. When you plug KVM in, you can still use these emulated devices. These are modeled by using that second level page table to put "not-valid" mappings in those MMIO ranges. These cause the CPU to trigger a page fault when it tries to access them, and KVM sees this, looks up the table of memory registered by QEMU, and sees that it is an address which QEMU wants to handle, so it returns from the KVM_RUN system call with result code that indicates there was an MMIO read/write that needs to be handled. QEMU then directs this into its emulated device model. Then when QEMU has performed that device emulation, it calls back into KVM to continue running the CPU.
It's all pretty clever. The really astounding thing is that most of the basic concepts for all this stuff were developed/discovered/invented like 50+ years ago.
Resources-wise there's not really any "stealing" going on. The people/companies who care about KVM and the virtualization use cases work on that, and the people/companies who care about emulation work on those parts. If QEMU didn't support virtualization then it's not like the people currently working on QEMU virtualization would shift over to emulation support: they'd be working on some other project instead to achieve their VM goals.
I'm curious if QEMU will ever support features like x86(_64) hardware paths that with Arm and RISC-V... Since most of the patents are now expired, it makes a lot of sense. Apple seems to be further along here than other competitors, but it seems to be limited to Rosetta, not broadly supported.
In my last job, I was on the team that handled the Windows NT bulid on DEC Alpha. Native Alpha apps were much faster than the equivalent Intel NT machines. Apropos to this topic, DEC had a sybsystem called FX!32 that was sort of like what Rosetta does for Apple Silicon, allowing Intel apps to be run at useable speeds on Alpha.
QEMU is a Software Freedom Conservancy member project like Git, OpenWRT, and many others. You can donate through the Conservancy link you posted and mention which project you wish to support.
Yes, it's possible and supported. QEMU can emulate an aarch64 system, and Google provides aarch64 Android builds for virtual machines specifically, called "Cuttlefish". Search for keywords "Android Cuttlefish QEMU" for instructions.
The official Android "emulator" supplied by Google is qemu. If you're not satisfied with it for some reason, IIRC I used these images some years ago on top of vanilla qemu:
They don't seem to be well supported anymore, and there aren't many prebuilt alternatives. One can always compile AOSP from source, though Google does not make this easy.
QEMU is truly excellent software, from the perspective of a person who very rarely needs to emulate another architecture. It "just works" and has wonderful integrations with basically everything I could want.. sometimes it feels like magic: even if the commandline UX is a bit weird in places.
I've always wondered though how it works with KVM: I know KVM is a virtualisation accelerator that enables passing through native code to the CPU somehow; but it feels like QEMU/KVM basically runs the internet now. Almost the entire modern cloud is built on QEMU and KVM as a hypervisor (right?) but I feel like I'm missing a lot about how it's working.
I also wonder if this steals huge amounts of resources away from emulation, or does it end up helping out. Because to say the modern internet is largely running on QEMU is likely a massive understatement.
KVM is basically three components.
* An abstraction over second level page tables to map some of a host user process as what the guest thinks of as physical memory.
* An abstraction to jump into the context that uses those page tables, and traps back out in the case of anything that the hardware would normally handle, but the hypervisor wants to handle manually instead.
* A collection of mechanisms to handle some of those traps in kernel space to avoid having to context switch back out to the host user process if the kind of trap is common enough, both in the sense of the trap itself happens often enough to show up on perf graphs, as well as the abstraction being exercised is relatively standard (think interrupt controllers and timers).
Let me know if you have any other questions.
How does nested KVM work? Are all the page tables handled by the top level? Do the traps have to propagate up?
I thought part of vt-d/vt-x made the "virtual tables" actual tables.
Eg- the memory the VM can access is controlled by the MMU of the CPU (below ring0/kernel). Resulting in the only VM escapes being the Shim(s) for talking with the host (network, memory balloon, graphics).
Where could someone get started in terms of reading material to learn more about this in depth?
From a different direction, I'd suggest https://www.devever.net/~hl/kvm
I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
The AMD Processor Programming Reference manuals are also good for this, if you like complete and detailed. They complement the Intel manuals. Much the material is duplicate because the processors are so similar, but written in a different way.
> I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
I can vouch for this. I'm no virtualization expert but I did stumble upon some intel developers manuals (truthfully, i fell into the rabbit hole) and just skimming it made everything make much more sense.
For example: https://www.intel.com/content/dam/www/public/us/en/documents... - "CHAPTER 23 INTRODUCTION TO VIRTUAL MACHINE EXTENSIONS"
The link above explains how the VMX extension work on intel processors. Any software doing hardware-assisted virtualization (so no binary translation, no full-system-emulation) will likely be using those instructions.
Yeah I also found myself curious as to how KVM actually works, I found these helpful
https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.... http://www.haifux.org/lectures/312/High-Level%20Introduction... https://zserge.com/posts/kvm/
Excellent. I haven't gone through them yet, but if you've any similar pointers for QEMU, please share.
My rough understanding is that it's the user-space emulation part of a virtualization solution. I.e., when the kernel traps the virtualized process, saying 'nope, you can't do that here', the control falls back to user space handler in QEMU saying, 'hey, the kernel said I can't do that there; can you sort this out?'. And this back-and-forth games keeps happening during the lifetime of the virtualized process.
Awesome, thanks for the entrypoint!
If you use it rarely, I can high recommend the excellent QuickEMU [0]
Any VM is just a `quickget ubuntu 24.04` and `quickemu --vm ubuntu-24.04.conf` away. The conf file is just a yaml that is very readable and can give you more cores/ram/disk easily. Just run `quickget` to get a list of OS's to download.
[0] https://github.com/quickemu-project/quickemu
qemu/kvm in enabling the cloud is huge but that's not the only place it really makes a tremendous difference. One example where it's essential is new OS development. They all basically first target the qemu machine with its virtual hardware. It makes development much faster compared to running on real hardawre while easily enabling debug output without needing cables and the like.
> I've always wondered though how it works with KVM
Other people have given some more comprehensive explanations, but I'll try to put it as simply as possible.
Plain QEMU has a CPU emulation layer called TCG. The machine basically consists of memory (RAM and MMIO devices) and CPUs (CPU registers and state). When QEMU has set up the machine and is ready to run, it calls TCG to say "given this memory and this initial CPU register state, start running instructions". When you use QEMU with KVM, the TCG emulation layer is swapped out with KVM and it asks KVM to start running instructions. That's it. KVM exposes APIs that caller can specify guest memory and initial CPU register state, and a call to run that CPU with that memory.
Going a bit further, the hardware virtualization functions that KVM uses have the ability to map that memory with a second level of translation which lets KVM present it to the guest at the locations it expects, and to prevent the guest from accessing any memory that it should not. The hardware also has the ability to run the CPU in a mode where it has the normal set of registers (which is what QEMU wants), but it maintains some additional hypervisor control registers not available to the guest, and those can ensure the guest can't take complete control of the CPU (for example, the guest OS can "disable interrupts" with the usual MSR or similar bit and that does prevent the guest from getting interrupts, but that it does not disable hypervisor directed interrupts, so the hypervisor can always take back control of the CPU with a hypervisor-IPI or hypervisor timer interrupt).
Further still: when running in plain QEMU mode, devices are emulated by registering MMIO ranges in the memory address space and emulated loads and stores have code to detect these regions and instead of performing a simple load or store, they call into device model code which handles it accordingly. When you plug KVM in, you can still use these emulated devices. These are modeled by using that second level page table to put "not-valid" mappings in those MMIO ranges. These cause the CPU to trigger a page fault when it tries to access them, and KVM sees this, looks up the table of memory registered by QEMU, and sees that it is an address which QEMU wants to handle, so it returns from the KVM_RUN system call with result code that indicates there was an MMIO read/write that needs to be handled. QEMU then directs this into its emulated device model. Then when QEMU has performed that device emulation, it calls back into KVM to continue running the CPU.
It's all pretty clever. The really astounding thing is that most of the basic concepts for all this stuff were developed/discovered/invented like 50+ years ago.
Resources-wise there's not really any "stealing" going on. The people/companies who care about KVM and the virtualization use cases work on that, and the people/companies who care about emulation work on those parts. If QEMU didn't support virtualization then it's not like the people currently working on QEMU virtualization would shift over to emulation support: they'd be working on some other project instead to achieve their VM goals.
Not everything uses qemu. Some do. More use KVM. Not everything does.
Example: https://firecracker-microvm.github.io/
Xen is still used massively too.
https://en.wikipedia.org/wiki/Fabrice_Bellard
> Experimental support for compiling to WASM using Emscripten.
Neat. This will unlock various online "playgrounds" for a number of CPU architectures, among other interesting use cases.
Likely this was possible beforehand, but it's nice to see it added as a feature to the project directly.
I'm curious if QEMU will ever support features like x86(_64) hardware paths that with Arm and RISC-V... Since most of the patents are now expired, it makes a lot of sense. Apple seems to be further along here than other competitors, but it seems to be limited to Rosetta, not broadly supported.
Didn't realize there was a MIPS build of Windows NT. Which led me to wikipedia to find there were a lot of other architectures supported in the past.
In my last job, I was on the team that handled the Windows NT bulid on DEC Alpha. Native Alpha apps were much faster than the equivalent Intel NT machines. Apropos to this topic, DEC had a sybsystem called FX!32 that was sort of like what Rosetta does for Apple Silicon, allowing Intel apps to be run at useable speeds on Alpha.
https://en.wikipedia.org/wiki/FX!32
A great piece of software that makes my and my dev team's life infinitely better and easier. A big thank you to the QEMU developers :)
I couldn't find any way to donate to QEMU directly, only https://sfconservancy.org/donate/
Donations are possible through PayPal: https://www.qemu.org/donations/
QEMU is a Software Freedom Conservancy member project like Git, OpenWRT, and many others. You can donate through the Conservancy link you posted and mention which project you wish to support.
are you by any chance a checker at a grocery store?
Awesome tech!
It's not possible to run an android VM on QEMU right? As in, is it officially supported? (I know about Waydroid)
Yes, it's possible and supported. QEMU can emulate an aarch64 system, and Google provides aarch64 Android builds for virtual machines specifically, called "Cuttlefish". Search for keywords "Android Cuttlefish QEMU" for instructions.
The official Android "emulator" supplied by Google is qemu. If you're not satisfied with it for some reason, IIRC I used these images some years ago on top of vanilla qemu:
https://www.fosshub.com/Android-x86.html
They don't seem to be well supported anymore, and there aren't many prebuilt alternatives. One can always compile AOSP from source, though Google does not make this easy.
> The official Android "emulator" supplied by Google is qemu
Nitpick: It's a fork of QEMU. There are quite a few Google-exclusive changes bundled-in.
If you’re on macOS or iOS, UTM is an excellent Qemu front end.
https://getutm.app/
https://mac.getutm.app/
Their website appears to be broken?
>(Cannot access the database)
WFM