KVM problem to start virtual machines

Do you have virt-manager-meta installed? :eyes:

1 Like

virt-manager-meta v5.1 is installed

I've been experiencing the exact same issues for about a week or so (I hadn't tried to start any VMs for about 2 weeks before that).

This does coincide with the release of libvirt 7.8.0, I did see somewhere on their mailing lists about improvements when cleaning up cgroups on shutdown. But tbh I definitely don't know enough about cgroups (let alone how libvirt wants/needs to manage them).

I tried downgrading libvirt to 7.7.0, as its the only thing I can see that has had updates in the last few weeks.
But that didn't change anything.

I installed the git versions of the following:

yay -S virtkvm-git libvirt-git qemu-arch-extra-git

Also didn't really fix anything:

virsh # start win10 
error: Failed to start domain 'win10'
error: Unable to read from '/sys/fs/cgroup/machine.slice/machine-qemu\x2d5\x2dwin10.scope/libvirt/cgroup.controllers': No such file or directory

It should be noted that I usually run the xanmod kernel, but switched to mainline to test and do the git and previous version installs. So no difference as far as I can tell there.

inxi -Fax :point_down:

System:    Kernel: 5.10.72-1-lts x86_64 bits: 64 compiler: gcc v: 11.1.0 
           parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-lts root=UUID=8c7d91ae-0b18-49bd-8870-4486aa0989d2 rw 
           rootflags=subvol=@ quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0 
           systemd.unified_cgroup_hierarchy=1 resume=UUID=536234d6-6610-4a6d-a58d-1c4fdcb7fc90 loglevel=3 
           Desktop: GNOME 40.5 tk: GTK 3.24.30 wm: gnome-shell dm: GDM 40.1 Distro: Garuda Linux base: Arch Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: SABERTOOTH Z170 S v: Rev 1.xx serial: <filter> UEFI: American Megatrends v: 3801 
           date: 03/14/2018 
Battery:   Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 2S serial: <filter> 
           charge: 55% (should be ignored) rechargeable: yes status: Discharging 
CPU:       Info: Quad Core model: Intel Core i7-6700K bits: 64 type: MT MCP arch: Skylake-S family: 6 model-id: 5E (94) 
           stepping: 3 microcode: EA cache: L2: 8 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 63999 
           Speed: 4109 MHz min/max: 800/4200 MHz Core speeds (MHz): 1: 4109 2: 4086 3: 4001 4: 4056 5: 4110 6: 4125 7: 4001 
           8: 4155 
           Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
           Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
           Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling 
           Type: srbds mitigation: Microcode 
           Type: tsx_async_abort mitigation: Clear CPU buffers; SMT vulnerable 
Graphics:  Device-1: NVIDIA GP104 [GeForce GTX 1070] vendor: Micro-Star MSI driver: nvidia v: 470.74 
           alternate: nouveau,nvidia_drm bus-ID: 01:00.0 chip-ID: 10de:1b81 class-ID: 0300 
           Device-2: Microsoft LifeCam Cinema type: USB driver: snd-usb-audio,uvcvideo bus-ID: 1-10:6 chip-ID: 045e:075d 
           class-ID: 0102 
           Display: x11 server: X.Org 1.20.13 compositor: gnome-shell driver: loaded: nvidia display-ID: :1 screens: 1 
           Screen-1: 0 s-res: 3840x2160 s-dpi: 96 s-size: 1016x572mm (40.0x22.5") s-diag: 1166mm (45.9") 
           Monitor-1: DP-0 res: 3840x2160 hz: 60 dpi: 140 size: 697x392mm (27.4x15.4") diag: 800mm (31.5") 
           OpenGL: renderer: NVIDIA GeForce GTX 1070/PCIe/SSE2 v: 4.6.0 NVIDIA 470.74 direct render: Yes 
Audio:     Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus-ID: 00:1f.3 chip-ID: 8086:a170 class-ID: 0403 
           Device-2: NVIDIA GP104 High Definition Audio vendor: Micro-Star MSI driver: snd_hda_intel v: kernel bus-ID: 01:00.1 
           chip-ID: 10de:10f0 class-ID: 0403 
           Device-3: Microsoft LifeCam Cinema type: USB driver: snd-usb-audio,uvcvideo bus-ID: 1-10:6 chip-ID: 045e:075d 
           class-ID: 0102 
           Device-4: C-Media Im Fulla Schiit type: USB driver: hid-generic,snd-usb-audio,usbhid bus-ID: 1-6:2 
           chip-ID: 0d8c:1066 class-ID: 0300 
           Sound Server-1: ALSA v: k5.10.72-1-lts running: yes 
           Sound Server-2: JACK v: 1.9.19 running: no 
           Sound Server-3: PulseAudio v: 15.0 running: no 
           Sound Server-4: PipeWire v: 0.3.38 running: yes 
Network:   Device-1: Intel Ethernet I219-V vendor: ASUSTeK driver: e1000e v: kernel port: f000 bus-ID: 00:1f.6 
           chip-ID: 8086:15b8 class-ID: 0200 
           IF: enp0s31f6 state: down mac: <filter> 
           Device-2: Broadcom BCM4360 802.11ac Wireless Network Adapter vendor: ASUSTeK driver: wl v: kernel modules: bcma 
           port: e000 bus-ID: 02:00.0 chip-ID: 14e4:43a0 class-ID: 0280 
           IF: wlp2s0 state: up mac: <filter> 
           IF-ID-1: virbr0 state: down mac: <filter> 
Drives:    Local Storage: total: 3.85 TiB used: 186.48 GiB (4.7%) 
           SMART Message: Required tool smartctl not installed. Check --recommends 
           ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Seagate model: BarraCuda 510 SSD ZP512CM30011 size: 476.94 GiB 
           block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: STCS1024 
           temp: 33.9 C scheme: GPT 
           ID-2: /dev/sda maj-min: 8:0 vendor: Crucial model: CT480BX500SSD1 size: 447.13 GiB block-size: physical: 512 B 
           logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter> rev: R013 scheme: GPT 
           ID-3: /dev/sdb maj-min: 8:16 vendor: Western Digital model: WD30EFRX-68EUZN0 size: 2.73 TiB block-size: 
           physical: 4096 B logical: 512 B speed: 6.0 Gb/s type: HDD rpm: 5400 serial: <filter> rev: 0A82 scheme: GPT 
           ID-4: /dev/sdc maj-min: 8:32 vendor: Crucial model: CT240M500SSD1 size: 223.57 GiB block-size: physical: 4096 B 
           logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter> rev: MU05 
Partition: ID-1: / raw-size: 459.56 GiB size: 459.56 GiB (100.00%) used: 186.47 GiB (40.6%) fs: btrfs dev: /dev/nvme0n1p2 
           maj-min: 259:2 
           ID-2: /boot/efi raw-size: 260 MiB size: 256 MiB (98.46%) used: 562 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 
           maj-min: 259:1 
           ID-3: /home raw-size: 459.56 GiB size: 459.56 GiB (100.00%) used: 186.47 GiB (40.6%) fs: btrfs dev: /dev/nvme0n1p2 
           maj-min: 259:2 
           ID-4: /var/log raw-size: 459.56 GiB size: 459.56 GiB (100.00%) used: 186.47 GiB (40.6%) fs: btrfs 
           dev: /dev/nvme0n1p2 maj-min: 259:2 
           ID-5: /var/tmp raw-size: 459.56 GiB size: 459.56 GiB (100.00%) used: 186.47 GiB (40.6%) fs: btrfs 
           dev: /dev/nvme0n1p2 maj-min: 259:2 
Swap:      Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) 
           ID-1: swap-1 type: partition size: 17.12 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/nvme0n1p3 maj-min: 259:3 
           ID-2: swap-2 type: zram size: 15.57 GiB used: 0 KiB (0.0%) priority: 100 dev: /dev/zram0 
Sensors:   System Temperatures: cpu: 31.0 C mobo: N/A gpu: nvidia temp: 51 C 
           Fan Speeds (RPM): N/A gpu: nvidia fan: 0% 
Info:      Processes: 304 Uptime: 36m wakeups: 2 Memory: 15.57 GiB used: 4.12 GiB (26.5%) Init: systemd v: 249 tool: systemctl 
           Compilers: gcc: 11.1.0 Packages: pacman: 1577 lib: 383 Shell: Zsh v: 5.8 running-in: tilix inxi: 3.3.06
2 Likes

Very useful information @stuartc .

Thank you for posting your findings and welcome to the forum.

1 Like

This thread is 6 months old and is probably totally unrelated:

https://www.reddit.com/r/VFIO/comments/mihb5j/systemd248_breaks_vm_boot_libvirt/

However, it does have some ideas not proposed before. One is to add this line to the kernel boot parameters:

systemd.unified_cgroup_hierarchy=0

Then:

grub-mkconfig -o /boot/grub/grub.cfg

or:

update-grub

Then reboot.

The Garuda default is:

systemd.unified_cgroup_hierarchy=1

Just thought I’d mention this post as it was preventing the startup of the VM’s. VM’s are not my thing, so I’m just throwing this out there on the remote chance it might help. I guess it could be a possibility that something recently has become incompatible with the change to cgroup v2.

https://wiki.archlinux.org/title/cgroups

Edit:

My suggestion to change to the cgroups load line GRUB_CMDLINE_LINUX_DEFAULT= in /etc/default/grub did solve a freezing issue when running VM’s for a user just now. So, it may very well be worth a try.

8 Likes

That solved it! I also saw this same recommendation elsewhere, but I assumed that Garuda and many others default to the new unified cgroups stuff - perhaps it's less common than I thought?

The there is also a compile flag I see mentioned here on the gentoo bug tracker: 691310 – sys-apps/systemd-243_rc1: libvirtd-lxc (and others) breaks because legacy cgroupv1 hierarchy is unavailable.

I don't fully understand what the difference between hybrid and unified, and given these kinds of tidbits of info are mostly like 3-4 years old - again assuming :sweat_smile: - that it wouldn't be an issue today.

Do you know when the systemd.unified_cgroup_hierarchy=1 was introduced.

Oh btw, I rolled back to the standard/current libvirt-7.8.0-1 and also testing on xanmod 5.14.10 (also 5.10 mainline), all works. Thanks again.

1 Like

Damn @tbg, great find :partying_face:

5 Likes

Ubuntu is/was one of the last holdouts to switch to cgroups version 2. I read a post from this Aug saying that they were just about to switch. They were apparently waiting until they had full compatibility with snaps before switching. Most distros are using version 2 now I believe.

Garuda has been using it for quite some time, a dev would know the exact date of implementation.

Glad to hear that fixed some of you with this issue up.

2 Likes

Confirmed this will solve the problem with VM's not booting. I would point out that for those that want or need the help, the setting can be found in the Garuda assistant under boot options. There you can edit the value of systemd.unified_cgroup_hierarchy=1 (i.e. change it to 0 and save)

I would then uncheck the cgroup v2 compatibility option on the same page. Save all changed, and reboot.

There is no need to use the UI, but given this is one of the best looking distro out there for those that use the GUI you have the option to address quickly .

Thanks to all .... always good to find a solution that helps everyone

4 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.