Cannot resume from sleep likely due to a kernel bug, try to downgrade kernel?

HW
Dell Inspiron 5405, CPU AMD Ryzen 4500U with Radeon Graphics
SW
OS: Garuda KDE Dr460nized
Kernel: 5.10.10-114-tkg-bmq

Issue
The system doesn't wake up from sleep - blank screen, no response. Don't really know how to fix it or where to ideally report this problem. Looking at logs it seems like some kind of bug in amdgpu module. Does anyone know the best/easiest way how to downgrade kernel in Garuda please? I can see only one kernel in the Kernel app and that's the current one. I cannot think of anything else to do. Thanks.

Journal log

    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: #PF: supervisor read access in kernel mode
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: #PF: error_code(0x0000) - not-present page
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: PGD 0 P4D 0
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: CPU: 0 PID: 13097 Comm: kworker/0:2 Tainted: G        W         5.10.10-114-tkg-bmq #1
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Hardware name: Dell Inc. Inspiron 5405/0MR83C, BIOS 1.4.0 10/26/2020
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: RIP: 0010:free_mqd_hiq_sdma+0x5/0x20 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Code: 00 48 01 d1 48 89 48 08 48 03 96 10 02 00 00 48 89 50 10 5b 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 <48> 83 7a 18 0>
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: RSP: 0018:ffffb28dd5c03d48 EFLAGS: 00010293
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: RAX: ffffffffc028c8b0 RBX: ffff8e380cb6d400 RCX: 000000008080007e
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e380cc4ec00
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: RBP: ffff8e380cc6a3c0 R08: 0000000000000001 R09: 0000000000000000
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8e380cb6d4d0
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: R13: ffff8e380db60000 R14: ffff8e38c38e0c00 R15: 0000000000000001
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: FS:  0000000000000000(0000) GS:ffff8e390ec00000(0000) knlGS:0000000000000000
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: CR2: 0000000000000018 CR3: 000000014f526000 CR4: 0000000000350ef0
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Call Trace:
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  kernel_queue_uninit+0x36/0xf0 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  stop_cpsch+0xa2/0xc0 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  amdgpu_device_gpu_recover.cold+0x36e/0x95d [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  amdgpu_job_timedout+0x121/0x140 [amdgpu]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  process_one_work+0x1d6/0x3a0
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  worker_thread+0x4d/0x3d0
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  ? rescuer_thread+0x410/0x410
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  kthread+0x133/0x150
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  ? __kthread_bind_mask+0x60/0x60
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  ret_from_fork+0x22/0x30
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: Modules linked in: ufs hfsplus hfs cdrom minix msdos jfs xfs ext4 mbcache jbd2 dm_mod ccm rfcomm cmac algif_hash algif_skcipher af_alg zram bnep bt>
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel:  snd_rn_pci_acp3x snd_pci_acp3x ccp ucsi_acpi soundcore typec_ucsi libarc4 typec mac_hid tpm_crb i2c_hid tpm_tis tpm_tis_core tpm rng_core dell_rbt>
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: CR2: 0000000000000018
    ม.ค. 28 19:58:47 ning-inspiron-5405 kernel: ---[ end trace 440a3b31e5699b08 ]---

A quick search on the forum should turn up much help, however you can simply do this:

sudo pacman -Syu linux-lts linux-lts-headers

Then restart and chose the LTS kernel from the grub advanced boot menu.

Do not uninstall your current kernel always keep two kernels installed for safety.

Thanks, I couldn’t find the answer using the forum search. I’m bit afraid that 5.4 may be too old for this HW, but I’d like to try. However encountered this issue when running your command:

 grub-btrfs-4.8.1...    23.5 KiB  23.0 MiB/s 00:00 [----------------------]  65%
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from builds.garudalinux.org : Maximum file size exceeded
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from repo.kitsuna.net : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from chaotic.tn.dedyn.io : Maximum file size exceeded
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from repo.jkanetwork.com : Maximum file size exceeded
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from chaotic.dr460nf1r3.me : Maximum file size exceeded
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from chaotic.bangl.de : Maximum file size exceeded
error: failed retrieving file 'paleofetch-garuda-r3.6471340-1-x86_64.pkg.tar.zst' from mirror.maakpain.kro.kr : Maximum file size exceeded
warning: failed to retrieve some files
error: failed to commit transaction (download library error)
Errors occurred, no packages were upgraded.

I thought it can be problem with my mirror, so I modified /etc/pacman.d/mirrorlist and added Server = https://mirrors.kernel.org/archlinux/$repo/os/$arch, however it doesn’t change anything. Maybe this issue is related, but I don’t understand what they mean by using Y/y.

Update your mirrorlist by ranking world servers:

sudo reflector --sort age --save /etc/pacman.d/mirrorlist

Or, by country (if your country has multiple fast servers):

sudo reflector --country US --latest 15 --age 2 --fastest 20 --protocol https --sort rate --save /etc/pacman.d/mirrorlist

You can also test other kernels and headers if you feel older is a poor choice. Some alternate choices might be linux-mainline, linux-zen, or simply linux.

You should probably alo post your full specs if you can’t resolve this on your own:

inxi -Fxxxza 

Sorry, I’m currently only on my cell, so that really limits my support capabilities.

Also see this post regarding the palefetch error:

What kind of sleep?
If it’s hibernation, check for available swap space.
Also try adding amdgpu in MODULES array in /etc/mkinitcpio.conf (and rebuild kernel images).

2 Likes

Thanks for helping me with this! Your recommended solution using reflector command works well and I'm able to install any kernel now.
Unfortunately I experience the same issue/error with all kernels I tried (zen, mainline). I noticed that the laptop is not completely dead after resume as I can see some firefox log entries in journal.
I'm not quite sure what I can do now. Already switched to Garuda from Elementary where I couldn't adjust brightness and sleep didn't work at all. So frustrating..
I noticed someone reported the same bug here, but doesn't seem there is any progress with it.
Should I perhaps report the bug in this bug tracker? I have no experience with this kind of problems. Any other idea what I can try please?

Inxi output:>

System:
  Kernel: 5.10.11-zen2-1-zen x86_64 bits: 64 compiler: gcc v: 10.2.1
  parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
  root=UUID=c9c30644-0f96-4a00-a6fd-5daa94d9e43b rw rootflags=subvol=@ quiet
  splash rd.udev.log_priority=3 vt.global_cursor_default=0
  systemd.unified_cgroup_hierarchy=1 loglevel=3
  Desktop: KDE Plasma 5.20.5 tk: Qt 5.15.2 info: latte-dock wm: kwin_x11
  dm: SDDM Distro: Garuda Linux
Machine:
  Type: Laptop System: Dell product: Inspiron 5405 v: 1.4.0 serial: <filter>
  Chassis: type: 10 v: 1.4.0 serial: <filter>
  Mobo: Dell model: 0MR83C v: A00 serial: <filter> UEFI: Dell v: 1.4.0
  date: 10/26/2020
Battery:
  ID-1: BAT0 charge: 23.6 Wh condition: 40.1/39.8 Wh (101%) volts: 11.3/11.2
  model: BYD DELL CF5RH08 type: Unknown serial: <filter> status: Discharging
CPU:
  Info: 6-Core model: AMD Ryzen 5 4500U with Radeon Graphics bits: 64
  type: MCP arch: Zen 2 family: 17 (23) model-id: 60 (96) stepping: 1
  microcode: 8600104 L2 cache: 3 MiB
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  bogomips: 28446
  Speed: 1397 MHz min/max: 1400/2375 MHz boost: disabled Core speeds (MHz):
  1: 1397 2: 1397 3: 1397 4: 1397 5: 1397 6: 1397
  Vulnerabilities: Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass
  mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1
  mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional,
  IBRS_FW, STIBP: disabled, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Renoir vendor: Dell driver: amdgpu v: kernel bus ID: 03:00.0
  chip ID: 1002:1636
  Device-2: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo
  bus ID: 3-2:2 chip ID: 0bda:565a serial: <filter>
  Display: x11 server: X.Org 1.20.10 compositor: kwin_x11 driver:
  loaded: amdgpu,ati unloaded: modesetting alternate: fbdev,vesa
  display ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.0x11.2")
  s-diag: 582mm (22.9")
  Monitor-1: eDP res: 1920x1080 hz: 60 dpi: 158 size: 309x173mm (12.2x6.8")
  diag: 354mm (13.9")
  OpenGL: renderer: AMD RENOIR (DRM 3.40.0 5.10.11-zen2-1-zen LLVM 11.0.1)
  v: 4.6 Mesa 20.3.3 direct render: Yes
Audio:
  Device-1: AMD vendor: Dell driver: snd_hda_intel v: kernel bus ID: 03:00.1
  chip ID: 1002:1637
  Device-2: AMD Raven/Raven2/FireFlight/Renoir Audio Processor vendor: Dell
  driver: snd_rn_pci_acp3x v: kernel alternate: snd_pci_acp3x
  bus ID: 03:00.5 chip ID: 1022:15e2
  Device-3: AMD Family 17h HD Audio vendor: Dell driver: snd_hda_intel
  v: kernel bus ID: 03:00.6 chip ID: 1022:15e3
  Sound Server: ALSA v: k5.10.11-zen2-1-zen
Network:
  Device-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
  vendor: Dell driver: ath10k_pci v: kernel bus ID: 02:00.0
  chip ID: 168c:003e
  IF: wlp2s0 state: up mac: <filter>
  Device-2: Qualcomm Atheros type: USB driver: btusb bus ID: 3-3:3
  chip ID: 0cf3:e007
Drives:
  Local Storage: total: 238.47 GiB used: 13.62 GiB (5.7%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
  model: PC SN530 NVMe WDC 256GB size: 238.47 GiB block size:
  physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 serial: <filter>
  rev: 21106012 temp: 50.9 C
Partition:
  ID-1: / raw size: 136.72 GiB size: 136.72 GiB (100.00%)
  used: 13.51 GiB (9.9%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
  ID-2: /boot/efi raw size: 150 MiB size: 146 MiB (97.33%)
  used: 110.6 MiB (75.8%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw size: 136.72 GiB size: 136.72 GiB (100.00%)
  used: 13.51 GiB (9.9%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
  ID-4: /var/log raw size: 136.72 GiB size: 136.72 GiB (100.00%)
  used: 13.51 GiB (9.9%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
  ID-5: /var/tmp raw size: 136.72 GiB size: 136.72 GiB (100.00%)
  used: 13.51 GiB (9.9%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
Swap:
  Kernel: swappiness: 10 (default 60) cache pressure: 75 (default 100)
  ID-1: swap-1 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram0
  ID-2: swap-2 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram1
  ID-3: swap-3 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram2
  ID-4: swap-4 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram3
  ID-5: swap-5 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram4
  ID-6: swap-6 type: zram size: 1.2 GiB used: 0 KiB (0.0%) priority: 32767
  dev: /dev/zram5
Sensors:
  System Temperatures: cpu: 47.2 C mobo: 39.0 C gpu: amdgpu temp: 46.0 C
  Fan Speeds (RPM): cpu: 0
Info:
  Processes: 232 Uptime: 8m wakeups: 2430 Memory: 7.2 GiB
  used: 2.5 GiB (34.8%) Init: systemd v: 247 Compilers: gcc: 10.2.0
  clang: 11.0.1 Packages: pacman: 1241 lib: 334 Shell: fish v: 3.1.2
  running in: alacritty inxi: 3.2.02

Thanks for checking my problem! I looked into that file and it already contains the entry:
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run. Advanced users may wish to specify all system modules
# in this array. For instance:
# MODULES=(intel_agp i915 amdgpu radeon nouveau)
MODULES=(intel_agp i915 amdgpu radeon nouveau)

From a quick check it appears your bios is out of date. Changing kernels will not improve things if this is a bios issue.

https://www.dell.com/support/kbdoc/en-ca/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment

https://www.dell.com/support/home/en-ca/drivers/driversdetails?driverid=frxcr

To be honest I don’t know which one is supposed to be the latest BIOS. Their website offers 1.3.0, but in the other versions there is 1.4.0 with the latest date. So I used that one as assumed it’s the newest one. I tried to downgrade/upgrade to 1.3.0 and it didn’t help much. The previous error disappeared, but there is a new one. The result is sadly the same, black screen on wake up.

The last 3 log entries:

ม.ค. 29 17:45:23 ning-inspiron-5405 kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR ring sdma0 timeout, signaled seq=1981, emitted seq=1983
ม.ค. 29 17:45:23 ning-inspiron-5405 kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process pid 0 thread pid 0
ม.ค. 29 17:45:23 ning-inspiron-5405 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!

Some googling showed that the new error may be more related to a graphics driver than kernel. Perhaps I can try to install a proprietary driver? Some of these?

The amdgpu-pro-installer contains proprietary components for AMDGPU (a.k.a. Radeon Software for Linux).

For proprietary OpenGL implementation, install amdgpu-pro-libglAUR. See AMDGPU#AMDGPU PRO for more details.

For proprietary OpenCL implementation, install opencl-amdAUR. See GPGPU#AMD/ATI for more details.

For proprietary Vulkan implementation, install vulkan-amdgpu-proAUR. See Vulkan for more details.

But you 've read only part of it…

Your power save settings may conflict with HW capabilities.
Read more.

2 Likes