System freezes when shutting down/logging out

After using the system for some time with dedicated nvidia GPU, there is a very high chance that the system will freeze when I try to shut it down or log off. Sometimes it freezes as soon as I click on shutdown or log off, other times it appears to be shutting down, but it never shuts down 100%, just like in this post:

This does not happen with X11. This also does not happen if I boot into system with Nvidia GPU completely off (unload the drivers?) (using supergfxctl 4.0.5). I have used linux-amd-znver3, linux-zen, linux-lts.

I have spent weeks trying to find a workaround/understand the cause, but I can’t.

Help.

system info:

Kernel: 6.9.4-AMD-znver3 arch: x86_64 bits: 64 compiler: gcc v: 14.1.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-amd-znver3
root=UUID=a3ed4574-408f-4802-8fc7-dded1b46b1ac rw rootflags=subvol=@
resume=UUID=21ea485a-d6f1-4646-b304-bca6967a48e0 loglevel=3
mitigations=off ibt=off
Desktop: KDE Plasma v: 6.0.5 tk: Qt v: N/A info: frameworks v: 6.3.0
wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
Type: Laptop System: ASUSTeK product: ROG Strix G713QM_G713QM v: 1.0
serial: <superuser required>
Mobo: ASUSTeK model: G713QM v: 1.0 serial: <superuser required>
uuid: <superuser required> UEFI: American Megatrends LLC. v: G713QM.331
date: 02/24/2023
Battery:
ID-1: BAT0 charge: 40.6 Wh (48.0%) condition: 84.5/90.0 Wh (93.9%)
power: 25.4 W volts: 14.7 min: 15.9 model: AS3GWAF3KC GA50358 type: Li-ion
serial: <filter> status: discharging
Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 2S
serial: <filter> charge: 55% (should be ignored) rechargeable: yes
status: discharging
CPU:
Info: model: AMD Ryzen 7 5800H with Radeon Graphics bits: 64 type: MT MCP
arch: Zen 3 gen: 4 level: v3 note: check built: 2021-22
process: TSMC n7 (7nm) family: 0x19 (25) model-id: 0x50 (80) stepping: 0
microcode: 0xA50000B
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
L3: 16 MiB desc: 1x16 MiB
Speed (MHz): avg: 836 high: 1397 min/max: 400/4463 scaling:
driver: amd-pstate-epp governor: powersave cores: 1: 1397 2: 400 3: 400
4: 400 5: 400 6: 400 7: 400 8: 1397 9: 1397 10: 400 11: 1397 12: 400
13: 1397 14: 1397 15: 1396 16: 400 bogomips: 102242
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] vendor: ASUSTeK
driver: nvidia v: 550.90.07 alternate: nvidia_drm non-free: 550.xx+
status: current (as of 2024-04; EOL~2026-12-xx) arch: Ampere code: GAxxx
process: TSMC n7 (7nm) built: 2020-2023 pcie: gen: 3 speed: 8 GT/s
lanes: 8 link-max: gen: 4 speed: 16 GT/s lanes: 16 ports: active: none
empty: DP-1 bus-ID: 01:00.0 chip-ID: 10de:2520 class-ID: 0300
Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
vendor: ASUSTeK driver: amdgpu v: kernel arch: GCN-5 code: Vega
process: GF 14nm built: 2017-20 pcie: gen: 3 speed: 8 GT/s lanes: 16
link-max: gen: 4 speed: 16 GT/s ports: active: HDMI-A-1 off: eDP-1
empty: none bus-ID: 06:00.0 chip-ID: 1002:1638 class-ID: 0300 temp: 49.0 C
Display: wayland server: X.org v: 1.21.1.13 with: Xwayland v: 24.1.0
compositor: kwin_wayland driver: X: loaded: amdgpu unloaded: modesetting
alternate: fbdev,vesa dri: radeonsi gpu: nvidia,amdgpu display-ID: 0
Monitor-1: HDMI-A-1 res: 1920x1200 size: N/A modes: N/A
API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
drv: nvidia device: 1 drv: radeonsi device: 3 drv: swrast gbm:
drv: kms_swrast surfaceless: drv: nvidia wayland: drv: radeonsi x11:
drv: radeonsi inactive: device-2
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.1.1-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon Graphics (radeonsi
renoir LLVM 17.0.6 DRM 3.57 6.9.4-AMD-znver3) device-ID: 1002:1638
memory: 500 MiB unified: no display-ID: :1.0
API: Vulkan v: 1.3.279 layers: 14 device: 0 type: integrated-gpu name: AMD
Radeon Graphics (RADV RENOIR) driver: mesa radv v: 24.1.1-arch1.1
device-ID: 1002:1638 surfaces: xcb,xlib,wayland device: 1
type: discrete-gpu name: NVIDIA GeForce RTX 3060 Laptop GPU driver: nvidia
v: 550.90.07 device-ID: 10de:2520 surfaces: xcb,xlib,wayland device: 2
type: cpu name: llvmpipe (LLVM 17.0.6 256 bits) driver: mesa llvmpipe
v: 24.1.1-arch1.1 (LLVM 17.0.6) device-ID: 10005:0000
surfaces: xcb,xlib,wayland
Audio:
Device-1: NVIDIA GA106 High Definition Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 8
link-max: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 01:00.1
chip-ID: 10de:228e class-ID: 0403
Device-2: AMD Renoir Radeon High Definition Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
link-max: gen: 4 speed: 16 GT/s bus-ID: 06:00.1 chip-ID: 1002:1637
class-ID: 0403
Device-3: AMD ACP/ACP3X/ACP6x Audio Coprocessor vendor: ASUSTeK
driver: N/A alternate: snd_pci_acp3x, snd_pci_acp5x, snd_pci_acp6x,
snd_acp_pci pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4
speed: 16 GT/s bus-ID: 06:00.5 chip-ID: 1022:15e2 class-ID: 0480
Device-4: AMD Family 17h/19h HD Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
link-max: gen: 4 speed: 16 GT/s bus-ID: 06:00.6 chip-ID: 1022:15e3
class-ID: 0403
API: ALSA v: k6.9.4-AMD-znver3 status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 1.0.7 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
vendor: ASUSTeK driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
lanes: 1 port: e000 bus-ID: 02:00.0 chip-ID: 10ec:8168 class-ID: 0200
IF: enp2s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel pcie: gen: 2
speed: 5 GT/s lanes: 1 bus-ID: 03:00.0 chip-ID: 8086:2723 class-ID: 0280
IF: wlp3s0 state: down mac: <filter>
Info: services: NetworkManager, smbd, systemd-timesyncd
Bluetooth:
Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-4:3 chip-ID: 8087:0029
class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 5.2
lmp-v: 11 status: discoverable: no pairing: no class-ID: 6c010c
Drives:
Local Storage: total: 2.77 TiB used: 1.28 TiB (46.2%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Seagate
model: XPG GAMMIX S70 BLADE size: 1.86 TiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 3.2.F.83 temp: 50.9 C scheme: GPT
ID-2: /dev/nvme1n1 maj-min: 259:3 vendor: Samsung model: SSD 980 1TB
size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 46.9 C
scheme: GPT
Partition:
ID-1: / raw-size: 500 GiB size: 500 GiB (100.00%) used: 307.27 GiB (61.5%)
fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:5
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 588 KiB (0.2%) fs: vfat dev: /dev/nvme1n1p1 maj-min: 259:4
ID-3: /home raw-size: 500 GiB size: 500 GiB (100.00%)
used: 307.27 GiB (61.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:5
ID-4: /var/log raw-size: 500 GiB size: 500 GiB (100.00%)
used: 307.27 GiB (61.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:5
ID-5: /var/tmp raw-size: 500 GiB size: 500 GiB (100.00%)
used: 307.27 GiB (61.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:5
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 30.79 GiB used: 0 KiB (0.0%) priority: 100
comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 16 dev: /dev/zram0
ID-2: swap-2 type: partition size: 33.85 GiB used: 0 KiB (0.0%)
priority: -2 dev: /dev/nvme1n1p4 maj-min: 259:7
Sensors:
System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 49.0 C
Fan Speeds (rpm): N/A
Info:
Memory: total: 32 GiB note: est. available: 30.79 GiB used: 4.08 GiB (13.2%)
Processes: 395 Power: uptime: 29m states: freeze,mem,disk suspend: s2idle
wakeups: 0 hibernate: platform avail: shutdown, reboot, suspend, test_resume
image: 12.27 GiB services: org_kde_powerdevil, power-profiles-daemon,
upowerd Init: systemd v: 255 default: graphical tool: systemctl
Packages: 2344 pm: pacman pkgs: 2330 libs: 599 tools: octopi,paru
pm: flatpak pkgs: 14 Compilers: clang: 17.0.6 gcc: 14.1.1 alt: 13
Shell: garuda-inxi default: fish v: 3.7.1 running-in: konsole inxi: 3.3.34
Garuda (2.6.26-1):
System install date:     2024-03-16
Last full system update: 2024-06-18
Is partially upgraded:   No
Relevant software:       snapper NetworkManager dracut nvidia-dkms
Windows dual boot:       Probably (Run as root to verify)
Failed units:

I’ve had the same issue once after upgrading the kernel to 6.7 and the nvidia driver to 550.**. Mine was a kernel panic. In arch Linux forum someone mention the proprietary nvidia blob is to be blamed and suggests to use the open kernel module of the nvidia drivers. That fixed it for me.

Then use X11 :slight_smile:

and wait for the fix from Nvidia :grin:
3 Likes

Somehow, the games stutters when I am using X11 and some of them won’t launch.
Basic browsing around the desktop is also clucky and for past 2 days, I am facing issues even logging into x11 session.

How do I do that? Is there a guide that you can point me to? Also, is there any significant performance impact?
Do I just uninstall nvidia-dkms and install nvidia-open-dkms?
Or Is it just better if I install nvidia-525-dkms and never update it?

Also, I had linux-lts (which was 6.6) and I still had this issue.

Yeah of course, the best place to start is the good old arch wiki, and some videos I found very helpful when I just started using Linux.

Basically yeah just that. But it will probably return some error as garuda-nvidia-config and garuda-nvidia-prime-config depends on nvidia-dkms.

So you have to do

sudo pacman -R nvidia-dkms garuda-nvidia-prime-config garuda-nvidia-config

followed by

sudo pacman -Syu nvidia-open-dkms

and reboot, but i don’t think it is good idea because the garuda devs have put those configs there for better optimisation purpose. So with that in mind if you want to use Wayland you can follow the above.

1 Like

When I ran sudo pacman -R nvidia-dkms garuda-nvidia-prime-config garuda-nvidia-config , pacman notified me that garuda-nvidia-prime-config was not installed. I decided to not to install nvidia-open-dkms and installed garuda-nvidia-prime-config just in case it was causing the issue.

I have since logged off and shut down multiple times and the system didn’t crash. I’ll test it out for a few more days and see how it goes.

Thanks you, your suggestion might have indirectly fixed the issue I was facing lol.

Update 1:
Had one crash while shutting down after a lot of time. I checked and garuda-nvida-config was also missing, so I installed it too. Will update again.

Update 2:
Still crashing, now I am using nvidia-open-dkms

Update 3:
Haven’t experienced a crash yet… But I have to switch to X11 when using external monitor (which is directly connected to dGPU)

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.