Random system crashes when AMD GPU utilized (WOW, Ollama, LM-Studio)

I am seeing random full system crashes when playing World of Warcraft or when using the GPU for LLM or SD work. It does not appear to have any triggers known, such as max use or wattage, it just happens at random intervals. It could last 5min or 10hours. It can be at maximum load for long periods without issue testing GPU at full load, then crash while standing still in game.
I have tried the std kernel and the zen3, both have same issue. DX11 and DX12 both crash (WOW). When it crashes it freezes the entire system, no keyboard command function, audio stops, and requires a hard reboot. Last one was ~5min after a clean reboot and running nothing except Battlenet/WOW after desktop load. Last night it crashed in WOW 4 times in ~hour, then ran fine for 6-7 hours.
Everything is up to date. GPU is not overclocked. CPU not overclocked. Temps are well within norms. Memtests and full CPU load tests show no errors.
I believe this is a GPU issue vs. a wine/game issue, as it will crash the same way (full system lock) when using SD or LLMs, or will run fine for days while LLMs are loaded sitting on the GPU VRAM. I have also had it crash similarly while testing some steam games.
I have had -no- crashes under any other normal operations, even with multiple browsers open and many apps open/running. Under windows, using the same hardware, I cannot remember having any crashes simply playing warcraft.

wine --version
wine-9.10 (Staging)

System:
Kernel: 6.10.2-AMD-znver3 arch: x86_64 bits: 64 compiler: gcc v: 14.1.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/garuda/boot/vmlinuz-linux-amd-znver3
root=UUID=9f604328-bea1-4d99-93c4-d9a53987c7cd rw rootflags=subvol=garuda
quiet loglevel=3 ibt=off
Desktop: KDE Plasma v: 6.1.4 tk: Qt v: N/A wm: kwin_wayland dm: SDDM
Distro: Garuda base: Arch Linux
Machine:
Type: Desktop System: Gigabyte product: X470 AORUS GAMING 7 WIFI v: N/A
serial: N/A
Mobo: Gigabyte model: X470 AORUS GAMING 7 WIFI-CF v: x.x serial: N/A
uuid: 03d502e0-045e-056d-9506-f80700080009 UEFI: American Megatrends LLC.
v: F63d date: 02/09/2023
CPU:
Info: model: AMD Ryzen 9 5900X socket: AM4 bits: 64 type: MT MCP
arch: Zen 3+ gen: 4 level: v3 note: check built: 2022 process: TSMC n6 (7nm)
family: 0x19 (25) model-id: 0x21 (33) stepping: 2 microcode: 0xA20120A
Topology: cpus: 1x cores: 12 tpc: 2 threads: 24 smt: enabled cache:
L1: 768 KiB desc: d-12x32 KiB; i-12x32 KiB L2: 6 MiB desc: 12x512 KiB
L3: 64 MiB desc: 2x32 MiB
Speed (MHz): avg: 3700 min/max: 2200/4950 boost: enabled
base/boost: 3700/4950 scaling: driver: acpi-cpufreq governor: performance
volts: 1.1 V ext-clock: 100 MHz cores: 1: 3700 2: 3700 3: 3700 4: 3700
5: 3700 6: 3700 7: 3700 8: 3700 9: 3700 10: 3700 11: 3700 12: 3700
13: 3700 14: 3700 15: 3700 16: 3700 17: 3700 18: 3700 19: 3700 20: 3700
21: 3700 22: 3700 23: 3700 24: 3700 bogomips: 177685
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M] vendor: ASRock
driver: amdgpu v: kernel arch: RDNA-3 code: Navi-3x process: TSMC n5 (5nm)
built: 2022+ pcie: gen: 4 speed: 16 GT/s lanes: 16 ports:
active: DP-2,DP-3 off: HDMI-A-1 empty: DP-1,Writeback-1 bus-ID: 0c:00.0
chip-ID: 1002:744c class-ID: 0300
Device-2: Logitech C922 Pro Stream Webcam driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-3:3
chip-ID: 046d:085c class-ID: 0102 serial: <filter>
Display: unspecified server: X.Org v: 24.1.2 with: Xwayland v: 24.1.2
compositor: kwin_wayland driver: X: loaded: amdgpu
unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi
gpu: amdgpu display-ID: :1 screens: 1
Screen-1: 0 s-res: 8140x2160 s-dpi: 96 s-size: 2154x572mm (84.80x22.52")
s-diag: 2229mm (87.74")
Monitor-1: DP-2 pos: primary,left res: 4300x1800 hz: 120 dpi: 137
size: 797x334mm (31.38x13.15") diag: 864mm (34.02") modes: N/A
Monitor-2: DP-3 pos: right res: 3840x2160 hz: 60 dpi: 161
size: 607x345mm (23.9x13.58") diag: 698mm (27.49") modes: N/A
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast gbm: drv: radeonsi surfaceless: drv: radeonsi x11:
drv: radeonsi inactive: wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.2-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 7900 XTX (radeonsi
navi31 LLVM 18.1.8 DRM 3.57 6.10.2-AMD-znver3) device-ID: 1002:744c
memory: 23.44 GiB unified: no
API: Vulkan v: 1.3.295 layers: 9 device: 0 type: discrete-gpu name: AMD
Radeon RX 7900 XTX (RADV NAVI31) driver: mesa radv v: 24.2.2-arch1.1
device-ID: 1002:744c surfaces: xcb,xlib device: 1 type: cpu name: llvmpipe
(LLVM 18.1.8 256 bits) driver: mesa llvmpipe v: 24.2.2-arch1.1 (LLVM
18.1.8) device-ID: 10005:0000 surfaces: xcb,xlib
Audio:
Device-1: AMD Navi 31 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 0c:00.1 chip-ID: 1002:ab30
class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 0e:00.4 chip-ID: 1022:1487 class-ID: 0403
Device-3: Logitech C922 Pro Stream Webcam driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-3:3
chip-ID: 046d:085c class-ID: 0102 serial: <filter>
Device-4: PD200X Podcast Microphone
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-1:2 chip-ID: 352f:0104 class-ID: 0300
serial: <filter>
Device-5: SteelSeries ApS Arctis 7
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-2:3 chip-ID: 1038:12ad class-ID: 0300
Device-6: C-Media CM106 Like Sound Device
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-4:5 chip-ID: 0d8c:0102 class-ID: 0300
API: ALSA v: k6.10.2-AMD-znver3 status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: PipeWire v: 1.2.3 status: n/a (root, process) with:
1: pipewire-pulse status: active 2: wireplumber status: active
3: pipewire-alsa type: plugin 4: pw-jack type: plugin
tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8125 2.5GbE driver: r8169 v: kernel pcie: gen: 2
speed: 5 GT/s lanes: 1 port: e000 bus-ID: 04:00.0 chip-ID: 10ec:8125
class-ID: 0200
IF: eno1 state: up speed: 2500 Mbps duplex: full mac: <filter>
Device-2: Intel I211 Gigabit Network vendor: Gigabyte driver: igb
v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: d000 bus-ID: 06:00.0
chip-ID: 8086:1539 class-ID: 0200
IF: enp6s0 state: down mac: <filter>
Device-3: Intel Wi-Fi 5 Wireless-AC 9x6x [Thunder Peak] driver: iwlwifi
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 07:00.0
chip-ID: 8086:2526 class-ID: 0280
IF: wlp7s0 state: down mac: <filter>
Info: services: NetworkManager, smbd, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: Intel Wireless-AC 9260 Bluetooth Adapter driver: btusb v: 0.8
type: USB rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-2:2
chip-ID: 8087:0025 class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 5.1
lmp-v: 10 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
Local Storage: total: 11.84 TiB used: 431.59 GiB (3.6%)
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Smart Modular Tech.
model: SHPP41-2000GM size: 1.82 TiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 51060A20 temp: 43.9 C scheme: GPT
SMART: yes health: PASSED on: 1y 23d 15h cycles: 424
read-units: 47,425,635 [24.2 TB] written-units: 51,795,098 [26.5 TB]
ID-2: /dev/nvme1n1 maj-min: 259:5 vendor: Samsung model: SSD 970 PRO 512GB
size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 1B2QEXP7 temp: 37.9 C
scheme: GPT
SMART: yes health: PASSED on: 4y 319d 14h cycles: 612
read-units: 154,178,975 [78.9 TB] written-units: 198,969,222 [101 TB]
ID-3: /dev/sda maj-min: 8:0 vendor: Seagate model: ST2000DX002-2DV164
family: FireCuda 3.5 size: 1.82 TiB block-size: physical: 4096 B
logical: 512 B sata: 3.1 speed: 6.0 Gb/s tech: HDD rpm: 7200
serial: <filter> fw-rev: CC41 temp: 40 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 6y 154d 12h cycles: 707
read: 65.04 TiB written: 13.89 TiB Pre-Fail: attribute: Spin_Retry_Count
value: 100 worst: 100 threshold: 97
ID-4: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST2000LX001-1RG174
family: FireCuda 2.5 size: 1.82 TiB block-size: physical: 4096 B
logical: 512 B sata: 3.1 speed: 6.0 Gb/s tech: HDD rpm: 5400
serial: <filter> fw-rev: SDM1 temp: 33 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 6y 60d 3h cycles: 707
read: 44.76 TiB written: 33.12 TiB Pre-Fail: attribute: Spin_Retry_Count
value: 100 worst: 100 threshold: 97
ID-5: /dev/sdc maj-min: 8:32 vendor: Toshiba model: HDWE160 family: X300
size: 5.46 TiB block-size: physical: 4096 B logical: 512 B sata: 3.0
speed: 6.0 Gb/s tech: HDD rpm: 7200 serial: <filter> fw-rev: FS2A
temp: 50 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 5y 197d 23h cycles: 484
Pre-Fail: reallocated sector: 100 threshold: 50
ID-6: /dev/sdd maj-min: 8:48 vendor: Western Digital
model: WDS500G2X0C-00L350 size: 465.76 GiB block-size: physical: 512 B
logical: 512 B type: USB rev: 3.2 spd: 5 Gb/s lanes: 1 mode: 3.2 gen-1x1
tech: N/A serial: <filter> fw-rev: 1.00 drive-rev: 101110WD
temp: 28 Celsius C scheme: GPT
SMART: yes health: PASSED on: 4y 253d 13h cycles: 11,216
read-units: 135,812,982 [69.5 TB] written-units: 130,353,048 [66.7 TB]
Partition:
ID-1: / raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 2.8 MiB (0.9%) fs: vfat block-size: 512 B dev: /dev/nvme1n1p1
maj-min: 259:6
ID-3: /home raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-4: /var/log raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-5: /var/tmp raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 62.73 GiB used: 16.2 MiB (0.0%)
priority: 100 comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 24
dev: /dev/zram0
Sensors:
System Temperatures: cpu: 51.6 C mobo: N/A gpu: amdgpu temp: 52.0 C
mem: 72.0 C
Fan Speeds (rpm): N/A gpu: amdgpu fan: 804
Info:
Memory: total: 64 GiB available: 62.73 GiB used: 9.58 GiB (15.3%)
Processes: 508 Power: uptime: 37m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 25.08 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
tool: systemctl
Packages: 2425 pm: dpkg pkgs: 0 pm: pacman pkgs: 2327 libs: 636
tools: octopi,pamac,paru pm: flatpak pkgs: 86 pm: snap pkgs: 12 Compilers:
clang: 18.1.8 gcc: 14.2.1 Shell: garuda-inxi (sudo) default: Bash
v: 5.2.32 running-in: konsole inxi: 3.3.35
Garuda (2.6.26-1):
System install date:     2024-06-05
Last full system update: 2024-09-09
Is partially upgraded:   No
Relevant software:       snapper NetworkManager dracut
Windows dual boot:       Yes
Failed units:

I also presume it is an issue with the translation layer (Wine). I have version 9-16 installed, though - from the Garuda Gamer app. I have also an AMD system, but I do not use Battle.net (I have even deleted the account, as the one and only thing I have used it for was for Diablo 3 beta).

Might not be the issue, but you have some BIOS beta version installed on your mainboard, you might want to update it:

You might also try different Kernels than what you have tried. I am personally using the default ZEN (nothing to do with the Zen processor). Some people have a better experience with the LTS version, as far as I have heard (never tried it myself).

1 Like

Holy shit what a night I just had, I just spent the last hour thinking I had a hardware issue with my new media center pc i just built 2 weeks ago. Currently im running Garuda Gnome on it.

So i get home from work and wake my pc up from sleep and about 30 seconds later random shutdown, so i turn it back on start a youtube video and shutdown again. Well i just had about 20 random shutdowns in the last hour. It got so bad I could not even boot my PC. I would press the power button on the case and instant shutdown within 2 seconds. At this point im freaking out.

Well I turned off the PC and killed the switch on the back of the PSU and pressed the power button a few times to clear the board and gave it a few minutes.

It turned on, whew, i can see the grub! At this point i restored from a few days old snapshot and it works. No hardware issue in sight. I can boot up and play a game / watch youtube just fine now, no random crash to be found. So… something is up with a recent update.

God damn i love Garuda, the snapshot feature has saved my ass so many times. Hope this helps.

 ╰─λ garuda-inxi
System:
  Kernel: 6.10.8-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=6fcfe4b4-11e2-452d-8da4-f22fd0d16a1e rw rootflags=subvol=@
    quiet resume=UUID=c85795bb-65f7-4a94-9167-a00a779958bd loglevel=3 ibt=off
  Desktop: GNOME v: 46.4 tk: GTK v: 3.24.43 wm: gnome-shell
    tools: gsd-screensaver-proxy dm: GDM v: 46.2 Distro: Garuda base: Arch Linux
Machine:
  Type: Desktop Mobo: ASRock model: X570 Taichi Razer Edition
    serial: <superuser required> uuid: <superuser required>
    UEFI: American Megatrends v: P5.63 date: 08/21/2024
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse M215 2nd Gen
    serial: <filter> charge: 100% (should be ignored) rechargeable: yes
    status: discharging
CPU:
  Info: model: AMD Ryzen 9 5950X bits: 64 type: MT MCP arch: Zen 3+ gen: 4
    level: v3 note: check built: 2022 process: TSMC n6 (7nm) family: 0x19 (25)
    model-id: 0x21 (33) stepping: 2 microcode: 0xA201210
  Topology: cpus: 1x cores: 16 tpc: 2 threads: 32 smt: enabled cache:
    L1: 1024 KiB desc: d-16x32 KiB; i-16x32 KiB L2: 8 MiB desc: 16x512 KiB
    L3: 64 MiB desc: 2x32 MiB
  Speed (MHz): avg: 3471 high: 3611 min/max: 2200/5083 boost: enabled
    scaling: driver: acpi-cpufreq governor: performance cores: 1: 3585 2: 3400
    3: 3400 4: 3592 5: 3400 6: 3400 7: 3400 8: 3611 9: 3400 10: 3400 11: 3575
    12: 3400 13: 3400 14: 3576 15: 3400 16: 3400 17: 3592 18: 3593 19: 3400
    20: 3400 21: 3595 22: 3400 23: 3400 24: 3400 25: 3400 26: 3400 27: 3592
    28: 3400 29: 3591 30: 3591 31: 3609 32: 3400 bogomips: 217187
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities: <filter>
Graphics:
  Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]
    vendor: Sapphire Pulse driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
    process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s
    lanes: 16 ports: active: DP-3,HDMI-A-1 empty: DP-1,DP-2,Writeback-1
    bus-ID: 0c:00.0 chip-ID: 1002:73bf class-ID: 0300
  Display: wayland server: X.org v: 1.21.1.13 with: Xwayland v: 24.1.2
    compositor: gnome-shell driver: X: loaded: amdgpu
    unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi
    gpu: amdgpu display-ID: 0
  Monitor-1: DP-3 model: Asus VW246 serial: <filter> built: 2011
    res: 1920x1080 dpi: 92 gamma: 1.2 size: 531x299mm (20.91x11.77")
    diag: 609mm (24") ratio: 16:9 modes: max: 1920x1080 min: 720x400
  Monitor-2: HDMI-A-1 model: 55S20 built: 2020 res: 3840x2160 dpi: 80
    gamma: 1.2 size: 800x450mm (31.5x17.72") diag: 1397mm (55") ratio: 16:9
    modes: max: 3840x2160 min: 720x400
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi
    wayland: drv: radeonsi x11: drv: radeonsi
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.2-arch1.1
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6800 (radeonsi navi21
    LLVM 18.1.8 DRM 3.57 6.10.8-zen1-1-zen) device-ID: 1002:73bf
    memory: 15.62 GiB unified: no display-ID: :0.0
  API: Vulkan v: 1.3.279 layers: 9 device: 0 type: discrete-gpu name: AMD
    Radeon RX 6800 (RADV NAVI21) driver: mesa radv v: 24.2.2-arch1.1
    device-ID: 1002:73bf surfaces: xcb,xlib,wayland device: 1 type: cpu
    name: llvmpipe (LLVM 18.1.8 256 bits) driver: mesa llvmpipe
    v: 24.2.2-arch1.1 (LLVM 18.1.8) device-ID: 10005:0000
    surfaces: xcb,xlib,wayland
Audio:
  Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
    gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 0c:00.1 chip-ID: 1002:ab28
    class-ID: 0403
  Device-2: AMD Starship/Matisse HD Audio vendor: ASRock
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 0e:00.4 chip-ID: 1022:1487 class-ID: 0403
  API: ALSA v: k6.10.8-zen1-1-zen status: kernel-api with: aoss
    type: oss-emulator tools: N/A
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: PipeWire v: 1.2.3 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Intel Wi-Fi 6 AX200 vendor: Rivet Networks Killer driver: iwlwifi
    v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 04:00.0
    chip-ID: 8086:2723 class-ID: 0280
  IF: wlp4s0 state: down mac: <filter>
  Device-2: Realtek Killer E3000 2.5GbE vendor: ASRock driver: r8169
    v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: f000 bus-ID: 05:00.0
    chip-ID: 10ec:3000 class-ID: 0200
  IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-3: Microsoft Xbox Wireless Adapter for Windows driver: N/A
    type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 5-1:2
    chip-ID: 045e:02fe class-ID: 0000 serial: <filter>
  IF-ID-1: wgpia0 state: unknown speed: N/A duplex: N/A mac: N/A
  Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
  Device-1: Intel AX200 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-2:2 chip-ID: 8087:0029
    class-ID: e001
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
    lmp-v: 11 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
  Local Storage: total: 6.41 TiB used: 152.81 GiB (2.3%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:1 model: PCIe SSD size: 1.86 TiB
    block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4
    tech: SSD serial: <filter> fw-rev: ECFM13.3 temp: 28.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Sabrent model: Rocket 4.0 2TB
    size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: RKT401.3 temp: 38.9 C
    scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: HGST (Hitachi) model: HTS721010A9E630
    size: 931.51 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    tech: HDD rpm: 7200 serial: <filter> fw-rev: A3U0 scheme: GPT
  ID-4: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST2000DM008-2FR102
    size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    tech: HDD rpm: 7200 serial: <filter> fw-rev: 0001
Partition:
  ID-1: / raw-size: 1.75 TiB size: 1.75 TiB (100.00%) used: 152.8 GiB (8.5%)
    fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:3
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 584 KiB (0.2%) fs: vfat dev: /dev/nvme1n1p1 maj-min: 259:2
  ID-3: /home raw-size: 1.75 TiB size: 1.75 TiB (100.00%)
    used: 152.8 GiB (8.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:3
  ID-4: /var/log raw-size: 1.75 TiB size: 1.75 TiB (100.00%)
    used: 152.8 GiB (8.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:3
  ID-5: /var/tmp raw-size: 1.75 TiB size: 1.75 TiB (100.00%)
    used: 152.8 GiB (8.5%) fs: btrfs dev: /dev/nvme1n1p2 maj-min: 259:3
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: zram size: 62.71 GiB used: 0 KiB (0.0%) priority: 100
    comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 32 dev: /dev/zram0
  ID-2: swap-2 type: partition size: 68.99 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/nvme1n1p3 maj-min: 259:4
Sensors:
  System Temperatures: cpu: 45.1 C mobo: N/A gpu: amdgpu temp: 43.0 C
    mem: 38.0 C
  Fan Speeds (rpm): N/A gpu: amdgpu fan: 0
Info:
  Memory: total: 64 GiB note: est. available: 62.71 GiB used: 6.89 GiB (11.0%)
  Processes: 581 Power: uptime: 52m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 25.07 GiB services: gsd-power,
    power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
    tool: systemctl
  Packages: 1486 pm: pacman pkgs: 1480 libs: 529 tools: gnome-software,
    octopi, pamac, paru pm: flatpak pkgs: 6 Compilers: clang: 18.1.8
    gcc: 14.2.1 Shell: garuda-inxi default: fish v: 3.7.1
    running-in: gnome-terminal inxi: 3.3.35
Garuda (2.6.26-1):
  System install date:     2024-09-02
  Last full system update: 2024-09-06
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager dracut
  Windows dual boot:       No/Undetected
  Failed units:            

Edit: â– â– â– â–  it just happened again, I just jinxed myself

I think my power supply is bad

I’m not ruling out wine, but, I get the same kind of hard crash when wine isn’t running and the GPU is simply being used but I am running stable diffusion or LLM models… again, randomly. Just far more often in wine/WOW… but I haven’t played much else to test wine or games for long periods.
Hoping someone can guide me towards a definitive answer because there are many things I could “try” and few may have any affect, and it’s not like I can test it in <5min. I just played Wow for about 2hours without an issue, then just now… blap.
I was able to run it for hours on end without any issue a few weeks ago, and it’s been too long to start zipping around with snapshots now. My guess is something core, amd/vulkan drivers and/or something similar. As it’s crashing so hard I cant get to a terminal, I’m not sure what’s blowing up.
I’ve been very happy moving to full time linux for about 4months now, I do not want to have to run windows, even dual boot, just to play one of the simplest/easiest/most popular games. I might try booting to a fresh mint or fedora, I suppose… as those appear to have better full driver pack/app support for radeon and rocm packages. Hoping the arch folks have better options.

Just curious if you have any I/O errors in your logs.

Please post the output of the following command:

journalctl | grep 'I/O error'

I have had some random reboots on my Ryzen 5 laptop since the latest 240909 linux-firmware got installed, just watching youtube or browsing the web. My microphone mute switch/F4 check-lite was always on. Did downgrade to 240809 firmware last nite, the mic lite is off now and time will tell if I still get these crashes with a reboot.

Nope, downgrading linux-firmware didn’t fix my crashes!

So, it -appears- that moving to the LTS kernel fixed the issue. I changed to LTS and turned resizable bar off in the BIOS at the same time, and had no more crashes while gaming.
I tried keeping resizable bar off, and going back to the zen3 kernel, and had a crash in WOW within 10-15min. Back to LTS and clear again. I have not tried turning resizable bar back on yet, however, or, updating my BIOS higher, yet.

I will do some more tests and post my results after re-enabling Re-Bar. Unless there is a good reason, though, I am not messing with the BIOS.

Now, if I can get stable-diffusion to run at some reasonable speed on the 7900XTX, without weirdness, I would be much happier ;p That is entirely a ROCm/Driver/Pytorch issue, though.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.