KVM problem to start virtual machines

marcopgordillo · 8 October 2021 00:47

KVM can't start virtual machines

Hello I have the same problem like this closed post KVM/QEMU cgroup.controllers - No such file or directory, but this issue was closed without a solution.

I updated the system too but I can't put working kvm virtual machines. When I try to start a virtual machine, I get this error message:

➜ sudo virsh start deb01
[sudo] password for jpiau:
error: Failed to start domain 'deb01'
error: Unable to read from '/sys/fs/cgroup/machine.slice/machine-qemu\x2d1\x2ddeb01.scope/libvirt/cgroup.controllers': No such file or directory

I'm using kernel-lts but I tried with kernel-zen too and nothing happens, the error is the same.
I think this is new because since maybe one month I could start the virtual machines without problems.

I've installed the virtual machines with virt-manager and the configuration files are like this:

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh edit deb01
or other application using the libvirt API.
-->

<domain type='kvm'>
  <name>deb01</name>
  <uuid>0fdffc1b-3c9c-429f-be4c-06c56adbbc41</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://debian.org/debian/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/deb01_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'/>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='unsafe'/>
      <source file='/home/mgordillo/VirtMachines/deb01.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:cf:52:73'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </sound>
    <video>
      <model type='virtio' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
    <rng model='virtio'>
      <backend model='random'>/dev/urandom</backend>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </rng>
  </devices>
</domain>

My cpu is a Intel core i7 with 32 GB RAM.

Please I anyone have a solution or recomendation, let me know.

Thanks.

Best Regards,

Marco

SGS · 8 October 2021 00:50

Please, start search in KVM/Qemu forums.

marcopgordillo · 8 October 2021 00:55

Thanks for your response SGS, but I have my data backed up, the problem is with the KVM functionality and I'm reporting the issue too.

Best Regards

Marco

SGS · 8 October 2021 01:02

By the way, we cannot fix every software bug in the Linux world.
Also, if it was a Garuda Linux problem, we would miss your inxi -Faz, it’s quite tiring to have to repeat it over and over again.

marcopgordillo · 8 October 2021 01:10

Thanks for your great help SGS, I hope there are colaborative people using Garuda that really wants to help others.

The problem I’m reporting is only with Garuda, I think because with CentOS, Debian and Ubuntu I don’t saw any problem like this.

➜ uname -a
Linux thebeast 5.10.71-1-lts #1 SMP Wed, 06 Oct 2021 14:19:52 +0000 x86_64 GNU/Linux

Name            : libvirt
Version         : 1:7.8.0-1
Name            : virt-manager
Version         : 3.2.0-1

SGS · 8 October 2021 01:59

we would miss your inxi -Faz

Naman · 8 October 2021 05:10

Have you seen and tried

https://wiki.archlinux.org/title/KVM

tbg · 8 October 2021 05:19

I also think you also must realize there are massive differences in kernels and package versions between the static distros you listed and a rolling distro.

atkatana · 8 October 2021 06:49

I am also having the same problem and also have multiple vm's running on current state Ubuntu, or Deb distro's. This is limited to Garuda and the patch process on or about Sept 26th. No fix found to date but both old (existing vms) and new fail to run. You can create but nothing will run from the GUI or CLI.

While I 100% understand that working on a rolling distro has its challenges I have not seen something break like this via update and take this long to work into the support threads and get addressed. This is an issue with Garuda and needs to be worked out within the community even if someone who starts the process bails out.

tbg · 8 October 2021 10:04

How can you possibly make such a statement with any degree of certainty. This is pure speculation on your part. Always remember, whenever you point the finger at someone else, there’s three fingers pointing back at you. Have you even checked the bug trackers on the related upstream projects (and Arch itself)? Have you tested numerous other kernels including linux-mainline and linux-next=git? Have you reported fully on your attempted fixes (as detailed in the help request template)? We are simply left guessing as to which fixes you have attempted. The required information you have provided is woefully inadequate. In addition your expectations of entitlement are way out of line for a free distro with a small dev team and an equally small group of forum support volunteers.

Neither of you have provided your system specs with an inxi -Faz output as requested multiple times. This is also explicitly spelled out as a requirement for help requests in the help request template. The help request template also explains explicitly that:

Without it, you will not receive any help from the Garuda team or your topic is likely to be closed without notice.

You ignore all the expectations on our forum, yet you expect the Garuda devs to come running to your rescue for software that has nothing to do with Garuda. Virtualization technologies are not our responsibility to maintain or bugfix. You should determine if this is a problem with your related virtualization packages by downgrading all their components to see if this corrects your issue. If the quemu or other related updates are at fault, then it is your responsibility to report this on the relevant projects upstream bug tracker.

Garuda expects users to perform their due diligence, and your performance has fallen far short in that department. Learn to do for yourselves before you expect others to do for you.

Sorry to be so blunt, but you need to put on your big boy boots and start doing more digging. There are posts out there on other forums similar to the errors you have received. Perhaps with a little more effort you can turn up something that may help.

atkatana · 8 October 2021 10:56

The point I am making is the update broke a functional component of the distro you are maintaining. I am not assigning any blame its an observation of the painfully obvious. The issue very much appears to be reproducable across multiple systems all running Garuda, all failing with the same error after an update.

I am not asking for anyone to run, nor for you to fix it, simply saying that to close the ticket seems like the exact opposite of what is needed.

Its broke on your distro over more then one PC running in more then one environment. If you would like to engage to address let me know. If you simply want to stand on your soap box have at it.

tbg · 8 October 2021 11:08

I already did address finding a solution:

Check your pacman log for the list of upgraded packages when the breakage occurred.

Downgrade all packages affecting virtualization selectively one at a time.

BTW one more post without an inxi -Faz and the thread will be locked.

The forum is not a one way street. You expect assistance, yet you do not provide requested outputs or requested information, and refuse to answer any questions put to you.

Have you performed the step I suggested to identify the cause?

So to sum things up @ atkatana:

No requested outputs provided.
No requests for information supplied.
No answers to any questions put to you.
No feedback to suggested solutions provided.

By demonstrating this type of behavior you come off as simply trolling our forum. If you actually want assistance you have a very funny way of showing it. Keep this lack of cooperation up with forum assistants and you will be burning your bridges here pretty soon with no one to blame but yourself.

Bro · 8 October 2021 13:32

If you chose to dual/multi-boot, that’s all on you. Garuda does not support it, as stated in multi-places.

atkatana · 8 October 2021 14:28

Not a troll just found the thread after you had closed it.

Not a dual boot or multi-boot system.

Output pasted below as this is the first request I have seen so it is the first time I can respond.

System:
  Kernel: 5.14.9-zen2-1-zen x86_64 bits: 64 compiler: gcc v: 11.1.0
  parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
  root=UUID=78efd0c8-0b1e-4f14-aea1-c5b26ccc73e8 rw rootflags=subvol=@ quiet
  splash rd.udev.log_priority=3 vt.global_cursor_default=0
  systemd.unified_cgroup_hierarchy=1 loglevel=3
  Console: tty pts/1 DM: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop System: Hewlett-Packard product: HP Z640 Workstation v: N/A
  serial: <filter> Chassis: type: 6 serial: <filter>
  Mobo: Hewlett-Packard model: 212A v: 1.01 serial: <filter>
  UEFI: Hewlett-Packard v: M60 v02.56 date: 11/04/2020
Battery:
  Device-1: hidpp_battery_0 model: Logitech K350 serial: <filter>
  charge: 50% (should be ignored) rechargeable: yes status: N/A
CPU:
  Info: 2x 12-Core model: Intel Xeon E5-2690 v3 bits: 64 type: MT MCP SMP
  arch: Haswell family: 6 model-id: 3F (63) stepping: 2 microcode: 46 cache:
  L2: 60 MiB
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  bogomips: 249210
  Speed: 1198 MHz min/max: 1200/3500 MHz Core speeds (MHz): 1: 1198 2: 1640
  3: 2910 4: 2047 5: 2120 6: 3394 7: 2194 8: 3375 9: 2242 10: 1555 11: 1198
  12: 2805 13: 1378 14: 2233 15: 2191 16: 1197 17: 1738 18: 1602 19: 2348
  20: 2154 21: 2275 22: 2264 23: 2515 24: 1474 25: 2358 26: 1568 27: 3023
  28: 1854 29: 1761 30: 2020 31: 3418 32: 3176 33: 3189 34: 1343 35: 1757
  36: 3496 37: 2270 38: 1576 39: 2567 40: 1505 41: 3490 42: 2292 43: 1852
  44: 1198 45: 1198 46: 2576 47: 1196 48: 2227
  Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf
  mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: spec_store_bypass
  mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1
  mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
  IBRS_FW, STIBP: conditional, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GK110GL [Quadro K5200] driver: nvidia v: 470.74
  alternate: nouveau,nvidia_drm bus-ID: 04:00.0 chip-ID: 10de:103c
  class-ID: 0300
  Display: server: X.org 1.20.13 compositor: kwin_x11 driver: loaded: nvidia
  tty: 80x24
  Message: Advanced graphics data unavailable in console. Try -G --display
Audio:
  Device-1: Intel C610/X99 series HD Audio vendor: Hewlett-Packard
  driver: snd_hda_intel v: kernel bus-ID: 00:1b.0 chip-ID: 8086:8d20
  class-ID: 0403
  Device-2: NVIDIA GK110 High Definition Audio driver: snd_hda_intel
  v: kernel bus-ID: 04:00.1 chip-ID: 10de:0e1a class-ID: 0403
  Device-3: JMTek LLC. TKGOU PnP USB Microphone type: USB
  driver: hid-generic,snd-usb-audio,usbhid bus-ID: 3-5.4:6
  chip-ID: 0c76:1467 class-ID: 0300 serial: <filter>
  Device-4: Texas Instruments PCM2704C stereo audio DAC type: USB
  driver: hid-generic,snd-usb-audio,usbhid bus-ID: 3-6:3 chip-ID: 08bb:27c4
  class-ID: 0300
  Sound Server-1: ALSA v: k5.14.9-zen2-1-zen running: yes
  Sound Server-2: JACK v: 1.9.19 running: no
  Sound Server-3: PulseAudio v: 15.0 running: no
  Sound Server-4: PipeWire v: 0.3.38 running: yes
Network:
  Device-1: Intel Ethernet I218-LM vendor: Hewlett-Packard driver: e1000e
  v: kernel port: 3040 bus-ID: 00:19.0 chip-ID: 8086:15a0 class-ID: 0200
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-2: Intel Ethernet 10-Gigabit X540-AT2
  vendor: Hewlett-Packard 10Gb 2-port 561T driver: ixgbe v: kernel
  port: 3000 bus-ID: 01:00.0 chip-ID: 8086:1528 class-ID: 0200
  IF: ens4f0 state: up speed: 10000 Mbps duplex: full mac: <filter>
  Device-3: Intel Ethernet 10-Gigabit X540-AT2
  vendor: Hewlett-Packard 10Gb 2-port 561T driver: ixgbe v: kernel
  port: 3000 bus-ID: 01:00.1 chip-ID: 8086:1528 class-ID: 0200
  IF: ens4f1 state: down mac: <filter>
Bluetooth:
  Device-1: ASUSTek ASUS USB-BT500 type: USB driver: btusb v: 0.8
  bus-ID: 3-11.4:11 chip-ID: 0b05:190e class-ID: e001 serial: <filter>
  Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
RAID:
  Hardware-1: Intel C610/X99 series sSATA Controller [RAID mode]
  driver: ahci v: 3.0 port: 3060 bus-ID: 00:11.4 chip-ID: 8086.2827 rev: 05
  class-ID: 0104
  Hardware-2: Intel C600/X79 series SATA RAID Controller driver: ahci v: 3.0
  port: 3020 bus-ID: 00:1f.2 chip-ID: 8086.2826 rev: 05 class-ID: 0104
Drives:
  Local Storage: total: 3.87 TiB used: 1.21 TiB (31.4%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung
  model: SSD 970 EVO Plus 500GB size: 465.76 GiB block-size: physical: 512 B
  logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
  rev: 2B2QEXM7 temp: 47.9 C scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: Hitachi model: HUA723030ALA641
  size: 2.73 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
  type: HDD rpm: 7200 serial: <filter> rev: A840 scheme: GPT
  SMART Message: Unknown smartctl error. Unable to generate data.
  ID-3: /dev/sdb maj-min: 8:16 vendor: Samsung model: SSD 840 EVO 750GB
  size: 698.64 GiB block-size: physical: 512 B logical: 512 B
  speed: 6.0 Gb/s type: SSD serial: <filter> rev: BB6Q scheme: GPT
  SMART Message: Unknown smartctl error. Unable to generate data.
Partition:
  ID-1: / raw-size: 465.51 GiB size: 465.51 GiB (100.00%)
  used: 100.15 GiB (21.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p2
  maj-min: 259:2
  ID-2: /boot/efi raw-size: 256 MiB size: 252 MiB (98.46%)
  used: 554 KiB (0.2%) fs: vfat block-size: 512 B dev: /dev/nvme0n1p1
  maj-min: 259:1
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: zram size: 62.81 GiB used: 30.8 MiB (0.0%)
  priority: 100 dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 35.0 C mobo: N/A gpu: nvidia temp: 50 C
  Fan Speeds (RPM): N/A gpu: nvidia fan: 31%
Info:
  Processes: 705 Uptime: 1d 2h 55m wakeups: 13 Memory: 62.81 GiB
  used: 10.48 GiB (16.7%) Init: systemd v: 249 tool: systemctl Compilers:
  gcc: 11.1.0 clang: 12.0.1 Packages: pacman: 1680 lib: 386 Shell: fish
  v: 3.3.1 running-in: tty pts/1 (SSH) inxi: 3.3.06

If you actually ask I am happy to assist.

tbg · 8 October 2021 20:58

Thank you for posting your inxi output @atkatana.

Perhaps to you it seems I’m going out of my way to be a dick, but you are making this an excercize in frustration for forum assistants. You still have not answered any of the questions I put to you.

If you really want to make progress on this issue you need to sift your pacman log for the list of which updates broke your virtualized environments. As you more or less know the date this happened, it hopefully won’t be too hard to narrow down the package(s) responsible for the breakage.

Please post the list of package updates that took place when your breakage occurred.

You can then start selectively downgrading any packages related to virtualization. You are the one that needs to put in the work if you wish to see a resolution to this issue. I only ever install systems to bare metal, so I can not troubleshoot this problem for you. You will likely need to do the detective work required if you expect progress to be made with this issue.

To the OP:

You have still not provided an inxi -Faz output. Perhaps there is a commonality between your systems that can be identified as a factor with this information. I have read posts on the Arch forum where specific hardware was the cause of virtualization breakages in the past. We can’t possibly determine if hardware could be a factor without your hardware specs.

I have asked several times now:

Neither of you have seen fit to answer to this query. You seriously can’t expect us to help find a solution if we need to keep guessing at everything. Getting answers from both of you is akin to pulling teeth.

I ask again:

Have you tested numerous other kernels including linux-mainline and linux-next-git?

Please start responding to questions put to you if you wish to receive assistance.

atkatana · 8 October 2021 21:42

I have not to date tried any kernel updates, role backs etc. In my experience so far if something I have not touched breaks it will generally resolve via the patch process. I had a GRUB issue that had to be fixed and a couple of file system issues that were self inflicted so in general I try to support the patch/update process. My VM's are not in general critical on a day to day, but they are needed at times and there is data or work effort on those vms so I need to get this resolved to support my work.

I do know the date range ~25-9-21 to 27-9-21 and will go get the logs and post them. I do understand the process and want to support getting this fixed for all.

tbg · 8 October 2021 23:11

Some other options for you to consider/test:

Go through your BIOS settings to see if there are any settings that affect virtualization that can be changed.

Check if your BIOS has an update available.

There is another option rather than downgrading all packages related to virtualization that were upgraded at the breakage. Look to see if the affected packages have a newer developmental git version that can be installed in preference to downgrading.

atkatana · 9 October 2021 08:02

Ok so the most likely transactions that I can find are the linux kernel and the linux firmware. Both update on 27-9-21. The only other transaction that day that would be in the mix is the linux-zen kernel and the linux-zen headers. All the kernels go from 5.14.7 to 5.14.8

Nothing else in the mix looks like it would have anyhitng to do with KVM, Virtualization ect. The update to openssh for instance is not imho likely to be a problem here.

On the BIOS question I am up to date with HP BIOS for the box. The VTx settings are more or less on/off and are set to on. No other updates or changes have been made to the system or BIOS and as noted to date I have not changed the Kernel or any other external settings.