Some games cause hard system crash (AMD, KDE, Wayland)

Some games cause hard system crash


Intro

Hello! I have a very weird problem with Linux and I need some help with it. I have had this problem on multiple distros (I’m currently running Garuda), and with multiple kernels (zen, cachyos, lts). I have tried debugging this and I am now completely lost, I didn’t find anything relevant online and chatgpt is also useless.

Description

The problem that I’m having is that when I try to launch some games (ex: Satisfactory, Frostpunk 2, I’m sure I can find more), the game launches and shows me a loading screen for a couple of seconds, then MY ENTIRE SYSTEM CRASHES, like hard crash, to the point where I have to unplug it, wait for the power to drain from it completely (I hold the power button for a bit) and then I can plug it back in and start it. My pc is more than beefy enough to handle all of the games, and I also have all of the requirements (software, drivers) in order to run them.

Here’s where it gets weirder: I can run the problematic games inside of the Live USB environment of any distro! Right now I just installed Garuda, thinking there was a problem with my previous distro. I ran Satisfactory successfully inside of the Live USB environment. Now, after installation, MY PC CRASHES when trying to run it! I haven’t uninstalled anything, or changed anything, it just doesn’t work, and it did from the Live USB like 30 minutes ago, on this exact same machine. I’m honestly quite lost. I also tested both with Wayland and Xorg.

I came from cachyos (arch-based) to garuda (also arch-based) yesterday. I installed it after checking that the games worked in liveusb. I think this problem is not dependent on the distro, though if I don’t get anywhere soon I’ll just hop to nobara and then to opensuse tumbleweed. I just don’t want to have to reconfigure everything again and still have the problem.

garuda-inxi:

System:
Kernel: 6.6.59-1-lts arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-lts
root=UUID=b271e9fd-5350-487d-ad72-eed4b152ed5b rw rootflags=subvol=@
resume=UUID=0beb666f-37e5-44bc-bd56-213c595acd69 loglevel=3
amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_fragment_size=9
Desktop: KDE Plasma v: 6.2.2 tk: Qt v: N/A info: frameworks v: 6.7.0
wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
Type: Desktop Mobo: Micro-Star model: PRO B650-S WIFI (MS-7E26) v: 1.0
serial: <superuser required> uuid: <superuser required> UEFI: American
Megatrends LLC. v: 1.20 date: 07/12/2023
CPU:
Info: model: AMD Ryzen 5 7600X bits: 64 type: MT MCP arch: Zen 4 gen: 4
level: v4 note: check built: 2022+ process: TSMC n5 (5nm) family: 0x19 (25)
model-id: 0x61 (97) stepping: 2 microcode: 0xA601203
Topology: cpus: 1x dies: 1 clusters: 1 cores: 6 threads: 12 tpc: 2
smt: enabled cache: L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 6 MiB
desc: 6x1024 KiB L3: 32 MiB desc: 1x32 MiB
Speed (MHz): avg: 4228 min/max: 400/5453 scaling: driver: amd-pstate-epp
governor: powersave cores: 1: 4228 2: 4228 3: 4228 4: 4228 5: 4228 6: 4228
7: 4228 8: 4228 9: 4228 10: 4228 11: 4228 12: 4228 bogomips: 112846
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 22 [Radeon RX 6700/6700
XT/6750 XT / 6800M/6850M XT] vendor: XFX driver: amdgpu v: kernel
arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie:
gen: 4 speed: 16 GT/s lanes: 16 ports: active: DP-2,HDMI-A-1
empty: DP-1,DP-3 bus-ID: 03:00.0 chip-ID: 1002:73df class-ID: 0300
Device-2: Advanced Micro Devices [AMD/ATI] Raphael vendor: Micro-Star MSI
driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm)
built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: none
empty: DP-4, DP-5, DP-6, HDMI-A-2 bus-ID: 12:00.0 chip-ID: 1002:164e
class-ID: 0300 temp: 45.0 C
Display: wayland server: X.org v: 1.21.1.14 with: Xwayland v: 24.1.4
compositor: kwin_wayland driver: X: loaded: amdgpu dri: radeonsi
gpu: amdgpu,amdgpu d-rect: 3456x1944 display-ID: 0
Monitor-1: DP-2 pos: top-right res: 1920x1080 size: N/A modes: N/A
Monitor-2: HDMI-A-1 pos: bottom-l res: 1536x864 size: N/A modes: N/A
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: radeonsi device: 2 drv: swrast gbm: drv: kms_swrast
surfaceless: drv: radeonsi wayland: drv: radeonsi x11: drv: radeonsi
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.6-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6750 XT (radeonsi
navi22 LLVM 18.1.8 DRM 3.54 6.6.59-1-lts) device-ID: 1002:73df
memory: 11.72 GiB unified: no display-ID: :1.0
API: Vulkan v: 1.3.295 layers: 13 device: 0 type: discrete-gpu name: AMD
Radeon RX 6750 XT (RADV NAVI22) driver: mesa radv v: 24.2.6-arch1.1
device-ID: 1002:73df surfaces: xcb,xlib,wayland device: 1
type: integrated-gpu name: AMD Radeon Graphics (RADV RAPHAEL_MENDOCINO)
driver: mesa radv v: 24.2.6-arch1.1 device-ID: 1002:164e
surfaces: xcb,xlib,wayland device: 2 type: cpu name: llvmpipe (LLVM
18.1.8 256 bits) driver: mesa llvmpipe v: 24.2.6-arch1.1 (LLVM 18.1.8)
device-ID: 10005:0000 surfaces: xcb,xlib,wayland
Audio:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 21/23 HDMI/DP Audio
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 03:00.1 chip-ID: 1002:ab28 class-ID: 0403
Device-2: Advanced Micro Devices [AMD/ATI] Rembrandt Radeon High
Definition Audio vendor: Micro-Star MSI driver: snd_hda_intel v: kernel
pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 12:00.1 chip-ID: 1002:1640
class-ID: 0403
Device-3: Advanced Micro Devices [AMD] Family 17h/19h HD Audio
vendor: Micro-Star MSI driver: snd_hda_intel v: kernel pcie: gen: 4
speed: 16 GT/s lanes: 16 bus-ID: 12:00.6 chip-ID: 1022:15e3 class-ID: 0403
Device-4: Razer USA BlackShark V2 Pro
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-1:2 chip-ID: 1532:0528 class-ID: 0300
API: ALSA v: k6.6.59-1-lts status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 1.2.6 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: e000 bus-ID: 0d:00.0
chip-ID: 10ec:8125 class-ID: 0200
IF: enp13s0 state: down mac: <filter>
Device-2: MEDIATEK MT7921K Wi-Fi 6E 80MHz driver: mt7921e v: kernel pcie:
gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 0e:00.0 chip-ID: 14c3:0608
class-ID: 0280
IF: wlp14s0 state: up mac: <filter>
Info: services: NetworkManager, smbd, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: MediaTek Wireless_Device driver: btusb v: 0.8 type: USB rev: 2.1
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-7:4 chip-ID: 0e8d:0608
class-ID: e001 serial: <filter>
Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
lmp-v: 11 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
Local Storage: total: 2.98 TiB used: 2.19 TiB (73.4%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Intel model: SSDPEKNU010TZ
size: 953.87 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 002C temp: 35.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: LITE-ON model: LCH-256V2S
size: 238.47 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 201 scheme: GPT
ID-3: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST1000DM010-2EP102
size: 931.51 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
tech: HDD rpm: 7200 serial: <filter> fw-rev: CC43 scheme: GPT
ID-4: /dev/sdc maj-min: 8:32 vendor: Western Digital
model: WD10EZEX-08WN4A0 size: 931.51 GiB block-size: physical: 4096 B
logical: 512 B speed: 6.0 Gb/s tech: HDD rpm: 7200 serial: <filter>
fw-rev: 1A02 scheme: GPT
Partition:
ID-1: / raw-size: 80.88 GiB size: 80.88 GiB (100.00%)
used: 20.68 GiB (25.6%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
ID-2: /boot/efi raw-size: 1000 MiB size: 998 MiB (99.80%)
used: 584 KiB (0.1%) fs: vfat dev: /dev/sda4 maj-min: 8:4
ID-3: /home raw-size: 122.61 GiB size: 122.61 GiB (100.00%)
used: 21.35 GiB (17.4%) fs: btrfs dev: /dev/dm-0 maj-min: 253:0
mapped: luks-f819c104-fc7b-46b3-b5be-b36fe654e7ab
ID-4: /var/log raw-size: 80.88 GiB size: 80.88 GiB (100.00%)
used: 20.68 GiB (25.6%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
ID-5: /var/tmp raw-size: 80.88 GiB size: 80.88 GiB (100.00%)
used: 20.68 GiB (25.6%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 14.79 GiB used: 1.76 GiB (11.9%)
priority: 100 comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 12
dev: /dev/zram0
ID-2: swap-2 type: partition size: 34 GiB used: 0 KiB (0.0%) priority: -2
dev: /dev/sda3 maj-min: 8:3
Sensors:
System Temperatures: cpu: 52.0 C mobo: 34.0 C
Fan Speeds (rpm): N/A
GPU: device: amdgpu temp: 44.0 C device: amdgpu temp: 50.0 C mem: 48.0 C
fan: 611 watts: 36.00
Info:
Memory: total: 16 GiB note: est. available: 14.79 GiB used: 5.37 GiB (36.3%)
Processes: 470 Power: uptime: 2h 13m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 5.9 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
tool: systemctl
Packages: pm: pacman pkgs: 2007 libs: 565 tools: octopi,paru Compilers:
clang: 18.1.8 gcc: 14.2.1 Shell: garuda-inxi default: fish v: 3.7.1
running-in: konsole inxi: 3.3.36
Garuda (2.6.26-1):
System install date:     2024-11-02
Last full system update: 2024-11-03
Is partially upgraded:   No
Relevant software:       snapper NetworkManager dracut
Windows dual boot:       No/Undetected
Failed units:

System Information and logs


Here’s where I have to mention that the proton_log (PROTON_LOG=1 %command%) was auto-deleting itself, so I had to create a script to get it. I think it’s very important to note the repeating errors at the end of the file

Some more info

  1. I’m running my system off an ssd, we’ll call it ssd1
  2. I have satisfactory on ssd2
  3. Frostpunk2 is on ssd3

Considering that I’m dealing with multiple disks and multiple file systems (I think it’s btrfs and ext4, so compatibility should be just fine), I don’t think this is the problem. I also have to add that when I ran the games in the Live USB, I ran them from the same disks (ssd2 and ssd3), nothing changed.

I have some more logs, and system diagnostics, will post in a comment, since I can’t put more links because I’m a new user.

Outro

I’ve searched on the internet and have asked for help on other forums and discord servers. I haven’t found anything useful anywhere and I don’t really know what I’m supposed to do now, hence this post. If there’s any information missing, I will provide it.

Thanks in advance, I hope no-one fried their brain reading this.

Logs collected when running satisfactory (from the loading screen, just a couple seconds, up until it crashed)

Also smartctl analysis for the drives:

Everything seems just fine. I have two more disks (that aren’t relevant to the problem anw), and those are good too. I know that this seems like a hardware failure, but I don’t think so. My uneducated guess is that it’s related to the GPU (and maybe to some drivers or stuff like vkd3d?)

I tried to use the same exact kernel parameters that were in the LiveUSB, the problem persists. I also tried disabling the iGPU, to no avail. I’m just gonna go out and assume this is an error in one of these:

  • amd driver
  • linux kernel
  • vkd3d

I assume it’s related to something in my specific configuration.

This is 100% NOT a hardware failure.

If someone could give me some pointers as to how I could try to bring my issue to the attention of the above mentioned projects (I’m thinking github issue, but It’ll probably get buried by other issues), or if they have any other idea I should try, I’d highly, highly, highly appreciate it.

Well hello there ! And sorry for your inconvenience, so seems baloo is using a lot of of your cpu before crash, maybe try disabling or uninstalling that ?

Also for the sake of readibility, alongside your currently linked journalctl logs, try running journalctl -f before starting the game to narrow down the scope of the log for easier diagnose. Well how to access the journalctl -f log after a hard boot ? Idk.

Also, might try downgrading your microcode package or something, probably very irrelevant.

Try fiddling with DXVK versions as well, or shall I say- try out older versions of Proton with older DXVK since DXVK did cause hard freezes and crashes(rarely)

hello Xeno,
also try another Proton Version…
and then ?

Thanks to everybody for their responses!
I figured it out partly. As far as I can tell, there was some issue with my power supply, which would only actually manifest itself when launching certain games. I changed my power supply, and now everything seems to work.


Scroll down for more details, and some speculation


First, I think it is of note that this seemed to only happen on DX12 games, running under Proton/Wine. I played a lot of more demanding games, that worked without any problem, so it is not related to PSU output.

I have to say that this is weird, considering my power supply isn’t faulty, this is proven by the fact that the game ran under Windows and in Garuda Live USB also. I’m no expert, but it would seem that something was causing my GPU to draw less power than it could.

I checked, and my PSU was more than powerful enough for my setup, being able to safely supply 650W. I attached a power monitor to my outlet, and my PC was drawing at most 400W-450W.

Even if it wasn’t enough, I don’t think it’s normal for my entire PC to hard crash? Not even being able to see some kernel panic logs, or anything like that. It would seem like my GPU would try to draw more power, something would happen, and then my GPU would just shrug and my PC would crash.

This is VERY weird, and I think that while it might just be a simple hardware problem at first glance, I think that there’s more to be explored here, regarding AMD gpu drivers, and maybe even vkd3d (seeing as this only happened on DX12 games, running under Proton/Wine).

That being said, this is a problem for someone else, as I am basically on the verge of madness after a week (give or take) of debugging.

Once again, many thanks!

Also, I think this thread should remain here, in case anyone has a similar problem.

Maybe a thing of the difference between Single Rail and Multi Rail PSU.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.