So..it looks like it has to do with the zen-kernel.
With the lts kernel, there are no problems so far.
Anyone knows how to set the lts kernel as the default one?
You should edit /etc/default/grub and change GRUB_DEFAULT=0 (First line) to GRUB_DEFAULT=1 or GRUB_DEFAULT="1>2" (third line in second line submenu), etc.
Then update-grub.
Is the Entry '1' definetly the entry for the LTS Kernel or are you guessing?
Can I see the order somewhere or do I have to restart and check it ?
Just guessing!
You need to check the grub.
If you want to cut it short, uninstall zen and leave only lts.
Reinstalling in the future would only take a few minutes anyway....
I experience the exact same problem you describe, slowness for a few seconds then complete freeze. Haven't noticed any errors in dmesg, journal or systemd at boot (although I don't know how to see them after a crash). Errors began after a reboot which I believe included a new kernel (login screen background changed to the blue-purpleish one). Might have been a week since I rebooted last time though, I usually suspend, so I can't know for sure what update might have caused it.
I also use docker, but the crashes occur despite not using it at the time (apart from running the service). Downgrading kernel (I'd rather not ditch zen if I can help it tbh), changing storage driver for docker, or even switching to podman are things I'll try next.
Do some searching on past forum posts regarding docker and btrfs.
Just to follow up my issue which seems to have gone away.
- I downgraded to kernel 5.14.6-zen1-1 (from 5.14.7)
- I did a btrfs balance
- I use a docker driver instead of kvm2/libvirt in minikube
- Changed docker storage driver to overlay2
Hopefully I can hop on 5.14.8 again.
Edit: Nope. Crashed after about an hour into 5.14.8. At least that pinpoints the problem I guess. Would be great to know what in >5.14.6 causes the crash.. Any troubleshooting tips would be appreciated.
Edit #2: 5.14.9-zen2 seems stable again!
Here's my inxi for reference:
inxi
System: Kernel: 5.14.8-zen1-1-zen x86_64 bits: 64 compiler: gcc v: 11.1.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=69c29b79-da5a-4df5-a53b-917ce7295884 rw rootflags=subvol=@ quiet splash
rd.udev.log_priority=3 vt.global_cursor_default=0 systemd.unified_cgroup_hierarchy=1
loglevel=3 mem_sleep_default=deep
Desktop: i3 4.19.1 info: i3bar vt: 7 dm: LightDM 1.30.0 Distro: Garuda Linux
base: Arch Linux
Machine: Type: Desktop System: Gigabyte product: X570 AORUS ULTRA v: -CF serial: <filter>
Mobo: Gigabyte model: X570 AORUS ULTRA serial: <filter> UEFI: American Megatrends LLC.
v: F33i date: 04/23/2021
CPU: Info: 12-Core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP arch: Zen 3
family: 19 (25) model-id: 21 (33) stepping: 0 microcode: A201009 cache: L2: 6 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 177607
Speed: 3598 MHz min/max: 2200/3700 MHz boost: enabled Core speeds (MHz): 1: 3598
2: 3901 3: 3593 4: 3595 5: 3595 6: 3593 7: 3593 8: 3596 9: 3597 10: 3598 11: 3599
12: 3595 13: 3590 14: 3601 15: 3593 16: 3589 17: 3594 18: 3592 19: 3605 20: 3697
21: 3599 22: 3597 23: 3599 24: 3602
Vulnerabilities: Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, IBRS_FW, STIBP:
always-on, RSB filling
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics: Device-1: NVIDIA GA104 [GeForce RTX 3070] vendor: Micro-Star MSI driver: nvidia
v: 470.74 alternate: nouveau,nvidia_drm bus-ID: 08:00.0 chip-ID: 10de:2484
class-ID: 0300
Device-2: Logitech Webcam C930e type: USB driver: snd-usb-audio,uvcvideo
bus-ID: 3-6.3:4 chip-ID: 046d:0843 class-ID: 0102 serial: <filter>
Display: x11 server: X.Org 1.20.13 compositor: picom v: git-dac85 driver:
loaded: nvidia display-ID: :0 screens: 1
Screen-1: 0 s-res: 2560x1440 s-dpi: 108 s-size: 602x342mm (23.7x13.5")
s-diag: 692mm (27.3")
Monitor-1: DP-0 res: 2560x1440 dpi: 109 size: 598x336mm (23.5x13.2") diag: 686mm (27")
OpenGL: renderer: NVIDIA GeForce RTX 3070/PCIe/SSE2 v: 4.6.0 NVIDIA 470.74
direct render: Yes
Audio: Device-1: NVIDIA GA104 High Definition Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel bus-ID: 08:00.1 chip-ID: 10de:228b class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte driver: snd_hda_intel
v: kernel bus-ID: 0a:00.4 chip-ID: 1022:1487 class-ID: 0403
Device-3: Logitech Webcam C930e type: USB driver: snd-usb-audio,uvcvideo
bus-ID: 3-6.3:4 chip-ID: 046d:0843 class-ID: 0102 serial: <filter>
Sound Server-1: ALSA v: k5.14.8-zen1-1-zen running: yes
Sound Server-2: sndio v: N/A running: no
Sound Server-3: JACK v: 1.9.19 running: no
Sound Server-4: PulseAudio v: 15.0 running: yes
Sound Server-5: PipeWire v: 0.3.37 running: yes
Network: Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus-ID: 03:00.0
chip-ID: 8086:2723 class-ID: 0280
IF: wlp3s0 state: down mac: <filter>
Device-2: Intel I211 Gigabit Network vendor: Gigabyte driver: igb v: kernel port: f000
bus-ID: 04:00.0 chip-ID: 8086:1539 class-ID: 0200
IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
IF-ID-1: br-a75c55ff1e63 state: down mac: <filter>
IF-ID-2: docker0 state: down mac: <filter>
IF-ID-3: virbr0 state: down mac: <filter>
Bluetooth: Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 3-5:2
chip-ID: 8087:0029 class-ID: e001
Report: bt-adapter ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
rfk-block: hardware: no software: yes address: <filter>
Drives: Local Storage: total: 1.48 TiB used: 268.68 GiB (17.7%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/sda maj-min: 8:0 vendor: Intel model: SSDSC2CT120A3 size: 111.79 GiB
block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter>
rev: 300i scheme: GPT
ID-2: /dev/sdb maj-min: 8:16 vendor: Western Digital model: WD10EARS-00Y5B1
size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 3.0 Gb/s type: N/A
serial: <filter> rev: 0A80 scheme: MBR
ID-3: /dev/sdc maj-min: 8:32 vendor: Samsung model: SSD 850 PRO 512GB size: 476.94 GiB
block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter>
rev: 2B6Q scheme: MBR
Partition: ID-1: / raw-size: 78.12 GiB size: 78.12 GiB (100.00%) used: 30.86 GiB (39.5%) fs: btrfs
dev: /dev/sda2 maj-min: 8:2
ID-2: /boot/efi raw-size: 1000 MiB size: 998 MiB (99.80%) used: 560 KiB (0.1%) fs: vfat
dev: /dev/sda1 maj-min: 8:1
ID-3: /home raw-size: 476.93 GiB size: 476.93 GiB (100.00%) used: 237.82 GiB (49.9%)
fs: btrfs dev: /dev/sdc1 maj-min: 8:33
ID-4: /var/log raw-size: 78.12 GiB size: 78.12 GiB (100.00%) used: 30.86 GiB (39.5%)
fs: btrfs dev: /dev/sda2 maj-min: 8:2
ID-5: /var/tmp raw-size: 78.12 GiB size: 78.12 GiB (100.00%) used: 30.86 GiB (39.5%)
fs: btrfs dev: /dev/sda2 maj-min: 8:2
Swap: Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 31.29 GiB used: 0 KiB (0.0%) priority: 100
dev: /dev/zram0
Sensors: System Temperatures: cpu: 39.8 C mobo: 16.8 C gpu: nvidia temp: 33 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 0%
Info: Processes: 505 Uptime: 8m wakeups: 0 Memory: 31.29 GiB used: 2.74 GiB (8.8%)
Init: systemd v: 249 tool: systemctl Compilers: gcc: 11.1.0 Packages: pacman: 1598
lib: 461 Client: Unknown Client: garuda-assistant inxi: 3.3.06
Just to update this topic, I also have this same freezing issue on GNOME but I dont have docker installed. Using latest zen kernel (also latest tkg-bmq), nvidia proprietary drivers, & btrfs. It might have started on the latest garuda version.
I can't run inxi right now because im on windows to test my system, but so far, no freezes for 3+ hours.
Edit: People have downgraded to linux-lts kernel to fix the issue, I will try later and update tonight EST.
I'm not sure why people refer to using the LTS kernel as a downgrade. IMO everyone should have the LTS kernel installed in case of a system breaking update to the Zen (or other) kernels. Unless you have brand new hardware technology, you likely do not require all the most recent kernel developments.
There's absolutely nothing wrong with running the LTS kernel if your hardware is a little older. Only those with the newest of hardware truly require the newest kernels.
Perpetuating this attitude is not very helpful. Often the easiest solution to many issues is simply switching kernels. Unfortunately, because of attitudes like yours many will not accept this as a temporary solution. Some people seem to think running an LTS kernel is akin to being forced to live in a ghetto. This is utter nonsense, there's nothing wrong with the LTS kernel.
It becomes more than a little frustrating as a forum assistant when people will just not accept running an LTS kernel temporarily because of this attitude amongst kernel snobbists.
If you're referring to my post, then I apologize. By downgrade, I was simply referring to the lower version number (ex 5.14 vs 5.10). I will change my wording in the future.
Also, I would love to use 5.10 LTS but my system refuses to boot with it after selecting it in grub. Only a black screen pops up with a flashing cursor that looks like an underscore. Do you know why this is the case?
inxi -Fxxxi
System: Host: rk-garuda Kernel: 5.14.8-zen1-1-zen x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: GNOME 40.5
tk: GTK 3.24.30 wm: gnome-shell dm: GDM 40.1 Distro: Garuda Linux base: Arch Linux
Machine: Type: Desktop Mobo: ASRock model: AB350M Pro4 serial: <superuser required> UEFI: American Megatrends v: P5.50
date: 12/20/2018
CPU: Info: 6-Core model: AMD Ryzen 5 2600 bits: 64 type: MT MCP arch: Zen+ rev: 2 cache: L2: 3 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 81436
Speed: 3793 MHz min/max: 1550/3400 MHz boost: enabled Core speeds (MHz): 1: 3793 2: 2848 3: 3889 4: 3861 5: 2795
6: 3256 7: 2744 8: 3063 9: 3676 10: 2242 11: 2803 12: 2396
Graphics: Device-1: NVIDIA GM200 [GeForce GTX 980 Ti] vendor: Gigabyte driver: nvidia v: 470.74 bus-ID: 23:00.0
chip-ID: 10de:17c8 class-ID: 0300
Display: x11 server: X.Org 1.20.13 compositor: gnome-shell driver: loaded: nvidia resolution: 1: 1920x1080
2: 2560x1440 s-dpi: 96
OpenGL: renderer: NVIDIA GeForce GTX 980 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 470.74 direct render: Yes
Audio: Device-1: NVIDIA GM200 High Definition Audio vendor: Gigabyte driver: snd_hda_intel v: kernel bus-ID: 23:00.1
chip-ID: 10de:0fb0 class-ID: 0403
Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASRock driver: snd_hda_intel v: kernel
bus-ID: 25:00.3 chip-ID: 1022:1457 class-ID: 0403
Device-3: GYROCOM C&C Fiio E10 type: USB driver: hid-generic,snd-usb-audio,usbhid bus-ID: 3-1:2 chip-ID: 1852:7022
class-ID: 0102
Sound Server-1: ALSA v: k5.14.8-zen1-1-zen running: yes
Sound Server-2: JACK v: 1.9.19 running: no
Sound Server-3: PulseAudio v: 15.0 running: yes
Sound Server-4: PipeWire v: 0.3.38 running: yes
Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock driver: r8169 v: kernel port: f000
bus-ID: 1f:00.0 chip-ID: 10ec:8168 class-ID: 0200
IF: enp31s0 state: up speed: 1000 Mbps duplex: full mac: 70:85:c2:4c:77:13
IP v4: 192.168.0.228/24 type: dynamic noprefixroute scope: global broadcast: 192.168.0.255
IP v4: 192.168.0.2/24 type: secondary noprefixroute scope: global broadcast: 192.168.0.255
WAN IP: 72.53.237.7
Bluetooth: Device-1: Cambridge Silicon Radio Bluetooth Dongle (HCI mode) type: USB driver: btusb v: 0.8 bus-ID: 1-6:2
chip-ID: 0a12:0001 class-ID: e001
Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: 00:15:83:F9:E1:F1
Drives: Local Storage: total: 3.44 TiB used: 1.38 TiB (40.0%)
ID-1: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB speed: 6.0 Gb/s type: HDD rpm: 7200
serial: ZFL2EQH4 rev: 0001
ID-2: /dev/sdb vendor: Crucial model: CT275MX300SSD1 size: 256.17 GiB speed: 6.0 Gb/s type: SSD
serial: 16441481D460 rev: R031 scheme: GPT
ID-3: /dev/sdc vendor: Crucial model: CT500MX500SSD4 size: 465.76 GiB speed: 6.0 Gb/s type: SSD
serial: 1902E1E201F5 rev: 023 scheme: GPT
ID-4: /dev/sdd vendor: Seagate model: ST1000DM010-2EP102 size: 931.51 GiB speed: 6.0 Gb/s type: HDD rpm: 7200
serial: Z9AGLJH0 rev: CC43 scheme: GPT
ID-5: /dev/sde type: USB vendor: SanDisk model: Ultra size: 7.45 GiB type: N/A serial: 20052845530A4800EF8D
rev: 1.20 scheme: GPT
Partition: ID-1: / size: 48.83 GiB used: 37.52 GiB (76.8%) fs: btrfs dev: /dev/sdc1
ID-2: /boot/efi size: 599.8 MiB used: 580 KiB (0.1%) fs: vfat dev: /dev/sdc3
ID-3: /home size: 416.35 GiB used: 99.79 GiB (24.0%) fs: btrfs dev: /dev/sdc2
ID-4: /var/log size: 48.83 GiB used: 37.52 GiB (76.8%) fs: btrfs dev: /dev/sdc1
ID-5: /var/tmp size: 48.83 GiB used: 37.52 GiB (76.8%) fs: btrfs dev: /dev/sdc1
Swap: ID-1: swap-1 type: zram size: 31.28 GiB used: 2 MiB (0.0%) priority: 100 dev: /dev/zram0
Sensors: System Temperatures: cpu: 43.4 C mobo: N/A gpu: nvidia temp: 34 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 36%
Info: Processes: 410 Uptime: 13m wakeups: 0 Memory: 31.28 GiB used: 4.38 GiB (14.0%) Init: systemd v: 249 Compilers:
gcc: 11.1.0 Packages: pacman: 1625 Shell: fish v: 3.3.1 running-in: gjs inxi: 3.3.06
Do you have the linux-lts-headers
installed?
yes, but only linux-zen and tkg-bmq work. I've even done a grubup command.
Edit: Even though it was already installed, reinstalling linux-lts-headers worked
Now its just a waiting game to see if any more freezes occur...
╭─rk@rk in ~ via v16.10.0
╰─λ pamac search headers | grepi 'installed'
xorgproto [Installed] 2021.5-1 extra
libcups [Installed] 1:2.3.3op2-3 extra
boost [Installed] 1.76.0-1 extra
acl [Installed] 2.3.1-1 core
vulkan-headers [Installed] 1:1.2.191-1 extra
linux-zen-headers [Installed] 5.14.8.zen1-1 extra
linux-zen-g14-headers [Installed] 5.13.6.zen1-1 chaotic-aur
linux-tkg-bmq-headers [Installed] 5.14.8-203 chaotic-aur
linux-lts-headers [Installed] 5.10.69-1 core
linux-api-headers [Installed] 5.12.3-1 core
If you’re refering to MY post, there’s nothing snobby about it, just nooby I’ve imagined zen being top of the line and I want my system top of the line. Not neglibly due to garuda’s frontpage selling point on zen:
"A faster, more-responsive Linux kernel optimized for desktop, multimedia and gaming.
Result of a collaborative effort of kernel hackers to provide the best Linux kernel possible for everyday systems. "
I mean who doesn’t want that? =)
Noobyness aside, would you advice staying up to date on linux-lts rather than downgrading versions on zen like I’m doing now when facing crashes like this?
Each user's hardware is different, test different kernels to find which works best for you.
grrr. last update with my lts kernel, now the same problem with my virtualization.
I really love bleeding edge, but sometimes I miss my good old debian stable.
Someone has a clue?
There are updates coming to linux
and linux-zen
that fix an issue with BFQ. linux-zen
hit this issue (which caused kernel panics), linux
was less susceptible.
Update to 5.14.9-arch2 or 5.14.9-zen2 and see if the issue persists.
https://lore.kernel.org/lkml/1624640454.149631.1632987871186@office.mailbox.org/T/
You can thank our @anon34128669 for this detective work.
Sorry for the late reply, I was on holiday for a week..
I really would like to thank him for this work, but to be honest, I do not now how to proceed.
All my new kernels ( linux, linux-zen, linux-lts ) have the same problem.
Is it possible, that you tell me, how to install the kernel you explained by hand?
My actual list ( pacman -Ss linux | grep 5.14 ) shows only zen1 and arch1 kernel, which I have already installed.
Sad to see others have the same problem, but they could not solve it either.
from another garuda user
I just switched my system to the 29/09/2021, on which it made an update.
My Kernel now is on 5.10.68~ and everythings works again, with the broken system it was on 5.14.10~.
So it is like you said and I thank @anon34128669 for his detective work, BUT what should I do now with this knowledge? Never update :=) ?
I'm not sure if you're being facetious, but apparently the issue you're having is different to what I thought it was.
If linux-lts
(5.10) works then there's some sort of regression in 5.14. If 5.10 newer than 5.10.68 also don't work correctly then that regression was backported.
If there's nothing in your log files that points in any direction then the only way forwards is to perform a kernel bisection to find the commit that introduces the issue, then report that to the kernel developers.
If 5.10.68 works but 5.10.69 does not then that also narrows the range of commits significantly and will make the bisection much quicker.
https://wiki.archlinux.org/index.php/Bisecting_bugs_with_Git
Hello Jonathan,
thank you for your answer.
I am not sure if I understand you correctly.
First of all, I don't mean this as a joke, of course.
What I am stating is quite simple.
At some point my virtual machines stopped running, with the error message described above.
I then switched to various other kernels and whenever they got an update, my machines stopped running again, always with the same error messages.
So I looked at the days of the updates in Timeshift and chose a specific date and restored at that time.
Now I see it running again and share the corresponding kernel state, nothing more, nothing less.
I have no idea what a 'kernel bisecting' is and I don't think I want to do that either.
It seems like @atkatana here from the Garuda forum has the same problem as me, unlike him, I'm not going to change distributions because of it though.
https://forum.garudalinux.org/t/kvm-problem-to-start-virtual-machines/13192/20
Unfortunately, I have never reported a 'bug' and I fear mail-back-and-forth, which I don't have the time or inclination to do.
Is it not possible to 'freeze' the state of my kernel until the problem is fixed and still get updates?