I am seeing random full system crashes when playing World of Warcraft or when using the GPU for LLM or SD work. It does not appear to have any triggers known, such as max use or wattage, it just happens at random intervals. It could last 5min or 10hours. It can be at maximum load for long periods without issue testing GPU at full load, then crash while standing still in game.
I have tried the std kernel and the zen3, both have same issue. DX11 and DX12 both crash (WOW). When it crashes it freezes the entire system, no keyboard command function, audio stops, and requires a hard reboot. Last one was ~5min after a clean reboot and running nothing except Battlenet/WOW after desktop load. Last night it crashed in WOW 4 times in ~hour, then ran fine for 6-7 hours.
Everything is up to date. GPU is not overclocked. CPU not overclocked. Temps are well within norms. Memtests and full CPU load tests show no errors.
I believe this is a GPU issue vs. a wine/game issue, as it will crash the same way (full system lock) when using SD or LLMs, or will run fine for days while LLMs are loaded sitting on the GPU VRAM. I have also had it crash similarly while testing some steam games.
I have had -no- crashes under any other normal operations, even with multiple browsers open and many apps open/running. Under windows, using the same hardware, I cannot remember having any crashes simply playing warcraft.
wine --version
wine-9.10 (Staging)
System:
Kernel: 6.10.2-AMD-znver3 arch: x86_64 bits: 64 compiler: gcc v: 14.1.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/garuda/boot/vmlinuz-linux-amd-znver3
root=UUID=9f604328-bea1-4d99-93c4-d9a53987c7cd rw rootflags=subvol=garuda
quiet loglevel=3 ibt=off
Desktop: KDE Plasma v: 6.1.4 tk: Qt v: N/A wm: kwin_wayland dm: SDDM
Distro: Garuda base: Arch Linux
Machine:
Type: Desktop System: Gigabyte product: X470 AORUS GAMING 7 WIFI v: N/A
serial: N/A
Mobo: Gigabyte model: X470 AORUS GAMING 7 WIFI-CF v: x.x serial: N/A
uuid: 03d502e0-045e-056d-9506-f80700080009 UEFI: American Megatrends LLC.
v: F63d date: 02/09/2023
CPU:
Info: model: AMD Ryzen 9 5900X socket: AM4 bits: 64 type: MT MCP
arch: Zen 3+ gen: 4 level: v3 note: check built: 2022 process: TSMC n6 (7nm)
family: 0x19 (25) model-id: 0x21 (33) stepping: 2 microcode: 0xA20120A
Topology: cpus: 1x cores: 12 tpc: 2 threads: 24 smt: enabled cache:
L1: 768 KiB desc: d-12x32 KiB; i-12x32 KiB L2: 6 MiB desc: 12x512 KiB
L3: 64 MiB desc: 2x32 MiB
Speed (MHz): avg: 3700 min/max: 2200/4950 boost: enabled
base/boost: 3700/4950 scaling: driver: acpi-cpufreq governor: performance
volts: 1.1 V ext-clock: 100 MHz cores: 1: 3700 2: 3700 3: 3700 4: 3700
5: 3700 6: 3700 7: 3700 8: 3700 9: 3700 10: 3700 11: 3700 12: 3700
13: 3700 14: 3700 15: 3700 16: 3700 17: 3700 18: 3700 19: 3700 20: 3700
21: 3700 22: 3700 23: 3700 24: 3700 bogomips: 177685
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M] vendor: ASRock
driver: amdgpu v: kernel arch: RDNA-3 code: Navi-3x process: TSMC n5 (5nm)
built: 2022+ pcie: gen: 4 speed: 16 GT/s lanes: 16 ports:
active: DP-2,DP-3 off: HDMI-A-1 empty: DP-1,Writeback-1 bus-ID: 0c:00.0
chip-ID: 1002:744c class-ID: 0300
Device-2: Logitech C922 Pro Stream Webcam driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-3:3
chip-ID: 046d:085c class-ID: 0102 serial: <filter>
Display: unspecified server: X.Org v: 24.1.2 with: Xwayland v: 24.1.2
compositor: kwin_wayland driver: X: loaded: amdgpu
unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi
gpu: amdgpu display-ID: :1 screens: 1
Screen-1: 0 s-res: 8140x2160 s-dpi: 96 s-size: 2154x572mm (84.80x22.52")
s-diag: 2229mm (87.74")
Monitor-1: DP-2 pos: primary,left res: 4300x1800 hz: 120 dpi: 137
size: 797x334mm (31.38x13.15") diag: 864mm (34.02") modes: N/A
Monitor-2: DP-3 pos: right res: 3840x2160 hz: 60 dpi: 161
size: 607x345mm (23.9x13.58") diag: 698mm (27.49") modes: N/A
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast gbm: drv: radeonsi surfaceless: drv: radeonsi x11:
drv: radeonsi inactive: wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.2-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 7900 XTX (radeonsi
navi31 LLVM 18.1.8 DRM 3.57 6.10.2-AMD-znver3) device-ID: 1002:744c
memory: 23.44 GiB unified: no
API: Vulkan v: 1.3.295 layers: 9 device: 0 type: discrete-gpu name: AMD
Radeon RX 7900 XTX (RADV NAVI31) driver: mesa radv v: 24.2.2-arch1.1
device-ID: 1002:744c surfaces: xcb,xlib device: 1 type: cpu name: llvmpipe
(LLVM 18.1.8 256 bits) driver: mesa llvmpipe v: 24.2.2-arch1.1 (LLVM
18.1.8) device-ID: 10005:0000 surfaces: xcb,xlib
Audio:
Device-1: AMD Navi 31 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 0c:00.1 chip-ID: 1002:ab30
class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 0e:00.4 chip-ID: 1022:1487 class-ID: 0403
Device-3: Logitech C922 Pro Stream Webcam driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-3:3
chip-ID: 046d:085c class-ID: 0102 serial: <filter>
Device-4: PD200X Podcast Microphone
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-1:2 chip-ID: 352f:0104 class-ID: 0300
serial: <filter>
Device-5: SteelSeries ApS Arctis 7
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-2:3 chip-ID: 1038:12ad class-ID: 0300
Device-6: C-Media CM106 Like Sound Device
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-4:5 chip-ID: 0d8c:0102 class-ID: 0300
API: ALSA v: k6.10.2-AMD-znver3 status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: PipeWire v: 1.2.3 status: n/a (root, process) with:
1: pipewire-pulse status: active 2: wireplumber status: active
3: pipewire-alsa type: plugin 4: pw-jack type: plugin
tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8125 2.5GbE driver: r8169 v: kernel pcie: gen: 2
speed: 5 GT/s lanes: 1 port: e000 bus-ID: 04:00.0 chip-ID: 10ec:8125
class-ID: 0200
IF: eno1 state: up speed: 2500 Mbps duplex: full mac: <filter>
Device-2: Intel I211 Gigabit Network vendor: Gigabyte driver: igb
v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: d000 bus-ID: 06:00.0
chip-ID: 8086:1539 class-ID: 0200
IF: enp6s0 state: down mac: <filter>
Device-3: Intel Wi-Fi 5 Wireless-AC 9x6x [Thunder Peak] driver: iwlwifi
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 07:00.0
chip-ID: 8086:2526 class-ID: 0280
IF: wlp7s0 state: down mac: <filter>
Info: services: NetworkManager, smbd, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: Intel Wireless-AC 9260 Bluetooth Adapter driver: btusb v: 0.8
type: USB rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-2:2
chip-ID: 8087:0025 class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 5.1
lmp-v: 10 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
Local Storage: total: 11.84 TiB used: 431.59 GiB (3.6%)
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Smart Modular Tech.
model: SHPP41-2000GM size: 1.82 TiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 51060A20 temp: 43.9 C scheme: GPT
SMART: yes health: PASSED on: 1y 23d 15h cycles: 424
read-units: 47,425,635 [24.2 TB] written-units: 51,795,098 [26.5 TB]
ID-2: /dev/nvme1n1 maj-min: 259:5 vendor: Samsung model: SSD 970 PRO 512GB
size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 1B2QEXP7 temp: 37.9 C
scheme: GPT
SMART: yes health: PASSED on: 4y 319d 14h cycles: 612
read-units: 154,178,975 [78.9 TB] written-units: 198,969,222 [101 TB]
ID-3: /dev/sda maj-min: 8:0 vendor: Seagate model: ST2000DX002-2DV164
family: FireCuda 3.5 size: 1.82 TiB block-size: physical: 4096 B
logical: 512 B sata: 3.1 speed: 6.0 Gb/s tech: HDD rpm: 7200
serial: <filter> fw-rev: CC41 temp: 40 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 6y 154d 12h cycles: 707
read: 65.04 TiB written: 13.89 TiB Pre-Fail: attribute: Spin_Retry_Count
value: 100 worst: 100 threshold: 97
ID-4: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST2000LX001-1RG174
family: FireCuda 2.5 size: 1.82 TiB block-size: physical: 4096 B
logical: 512 B sata: 3.1 speed: 6.0 Gb/s tech: HDD rpm: 5400
serial: <filter> fw-rev: SDM1 temp: 33 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 6y 60d 3h cycles: 707
read: 44.76 TiB written: 33.12 TiB Pre-Fail: attribute: Spin_Retry_Count
value: 100 worst: 100 threshold: 97
ID-5: /dev/sdc maj-min: 8:32 vendor: Toshiba model: HDWE160 family: X300
size: 5.46 TiB block-size: physical: 4096 B logical: 512 B sata: 3.0
speed: 6.0 Gb/s tech: HDD rpm: 7200 serial: <filter> fw-rev: FS2A
temp: 50 C scheme: GPT
SMART: yes state: enabled health: PASSED on: 5y 197d 23h cycles: 484
Pre-Fail: reallocated sector: 100 threshold: 50
ID-6: /dev/sdd maj-min: 8:48 vendor: Western Digital
model: WDS500G2X0C-00L350 size: 465.76 GiB block-size: physical: 512 B
logical: 512 B type: USB rev: 3.2 spd: 5 Gb/s lanes: 1 mode: 3.2 gen-1x1
tech: N/A serial: <filter> fw-rev: 1.00 drive-rev: 101110WD
temp: 28 Celsius C scheme: GPT
SMART: yes health: PASSED on: 4y 253d 13h cycles: 11,216
read-units: 135,812,982 [69.5 TB] written-units: 130,353,048 [66.7 TB]
Partition:
ID-1: / raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 2.8 MiB (0.9%) fs: vfat block-size: 512 B dev: /dev/nvme1n1p1
maj-min: 259:6
ID-3: /home raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-4: /var/log raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
ID-5: /var/tmp raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 431.58 GiB (90.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme1n1p2
maj-min: 259:7
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 62.73 GiB used: 16.2 MiB (0.0%)
priority: 100 comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 24
dev: /dev/zram0
Sensors:
System Temperatures: cpu: 51.6 C mobo: N/A gpu: amdgpu temp: 52.0 C
mem: 72.0 C
Fan Speeds (rpm): N/A gpu: amdgpu fan: 804
Info:
Memory: total: 64 GiB available: 62.73 GiB used: 9.58 GiB (15.3%)
Processes: 508 Power: uptime: 37m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 25.08 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
tool: systemctl
Packages: 2425 pm: dpkg pkgs: 0 pm: pacman pkgs: 2327 libs: 636
tools: octopi,pamac,paru pm: flatpak pkgs: 86 pm: snap pkgs: 12 Compilers:
clang: 18.1.8 gcc: 14.2.1 Shell: garuda-inxi (sudo) default: Bash
v: 5.2.32 running-in: konsole inxi: 3.3.35
Garuda (2.6.26-1):
System install date: 2024-06-05
Last full system update: 2024-09-09
Is partially upgraded: No
Relevant software: snapper NetworkManager dracut
Windows dual boot: Yes
Failed units: