Hi all,
I built a new PC at the start of September and swapped to Linux (I still have Windows dual boot). At first I installed Linux Mint, but I learned the driver for the 9070 XT was not sufficient. I swapped to Bazzite which ran nicely, but I had issues with crashing. I have been unable to resolve the crashing so have been trying other distros.
I tried Opensuse Tumbleweed and experienced no crashing (19.5 hour game stress test)
I am now on Garuda and am experiencing the exact same crashing as on Bazzite.
Crashing
I am able to run games and get good performance. However after 2 - 5 hours the game will crash. Sometimes the desktop environment goes with it and I have to restart, sometimes it is able to recover.
The crash presents as a sudden freeze of the screen on both monitors, but audio keeps playing for several minutes. Game audio eventually stops, but any other audio source (video on second monitor for example) will keep playing.
The crash has mostly happened in World of Warcraft and Cities Skylines both played through Steam using Proton. I crashed once in Space Marine 2. I suspect any game would crash on a long enough session though, I just haven’t played other games as much.
Using journalctl I have pulled logs from many different crashes and there are a few repeating themes:
- amdgpu ring timeout & reset
- context lost
- Sometimes Pageflip timeout
I will include the latest journalctl below. I have around 12 exports of different crash reports but don’t want to fill the post unnecessarily, but I have them saved and can check them for more info (But many come from Bazzite).
Nov 01 17:09:31 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:157 vmid:7 pasid:32803)
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Process Cities2.exe pid 25080 thread dxvk-submit pid 25136
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000800002d18000 from client 10
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0070153A
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: SQC (data) (0xa)
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
Nov 01 17:09:32 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x0
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=27281061, emitted seq=27281063
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Process Cities2.exe pid 25080 thread dxvk-submit pid 25136
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
Nov 01 17:09:42 Gilgamesh kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
Nov 01 17:09:42 Gilgamesh systemd[1]: Starting autorandr execution hook...
░░ Subject: A start job for unit autorandr.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit autorandr.service has begun execution.
░░
░░ The job identifier is 6980.
Nov 01 17:09:42 Gilgamesh systemd[1]: autorandr.service: Deactivated successfully.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit autorandr.service has successfully entered the 'dead' state.
Nov 01 17:09:42 Gilgamesh systemd[1]: Finished autorandr execution hook.
░░ Subject: A start job for unit autorandr.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit autorandr.service has finished successfully.
░░
░░ The job identifier is 6980.
Nov 01 17:09:42 Gilgamesh lact[2950]: 2025-11-01T17:09:42.172649Z INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
Nov 01 17:09:42 Gilgamesh lact[2950]: 2025-11-01T17:09:42.177417Z INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-148C:2435-0000:03:00.0 at '/sys/class/drm/card1/device'
Nov 01 17:09:42 Gilgamesh lact[2950]: 2025-11-01T17:09:42.177424Z INFO lact_daemon::server::handler: GPU list reloaded with 1 devices, reapplying configuration
Nov 01 17:09:42 Gilgamesh lact[2950]: 2025-11-01T17:09:42.177447Z INFO lact_daemon::server::handler: configuration applied
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=27281062, emitted seq=27281065
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Process Cities2.exe pid 25080 thread dxvk-submit pid 25136
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
Nov 01 17:09:52 Gilgamesh kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
Nov 01 17:09:52 Gilgamesh systemd[1]: Starting autorandr execution hook...
░░ Subject: A start job for unit autorandr.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit autorandr.service has begun execution.
░░
░░ The job identifier is 7113.
Nov 01 17:09:52 Gilgamesh steam[19178]: radv/amdgpu: The CS has been cancelled because the context is lost. This context is guilty of a hard recovery.
Nov 01 17:09:52 Gilgamesh steam[19178]: radv: GPUVM fault detected at address 0x800002d18000.
Nov 01 17:09:52 Gilgamesh steam[19178]: GCVM_L2_PROTECTION_FAULT_STATUS: 0x70153a
Nov 01 17:09:52 Gilgamesh steam[19178]: CLIENT_ID: (SQC (data)) 0xa
Nov 01 17:09:52 Gilgamesh steam[19178]: MORE_FAULTS: 0
Nov 01 17:09:52 Gilgamesh steam[19178]: WALKER_ERROR: 5
Nov 01 17:09:52 Gilgamesh steam[19178]: PERMISSION_FAULTS: 3
Nov 01 17:09:52 Gilgamesh steam[19178]: MAPPING_ERROR: 1
Nov 01 17:09:52 Gilgamesh steam[19178]: RW: 0
Nov 01 17:09:52 Gilgamesh systemd[1]: autorandr.service: Deactivated successfully.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit autorandr.service has successfully entered the 'dead' state.
Nov 01 17:09:52 Gilgamesh systemd[1]: Finished autorandr execution hook.
░░ Subject: A start job for unit autorandr.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit autorandr.service has finished successfully.
░░
░░ The job identifier is 7113.
Nov 01 17:09:52 Gilgamesh lact[2950]: 2025-11-01T17:09:52.412459Z INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
Nov 01 17:09:52 Gilgamesh lact[2950]: 2025-11-01T17:09:52.417840Z INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-148C:2435-0000:03:00.0 at '/sys/class/drm/card1/device'
Nov 01 17:09:52 Gilgamesh lact[2950]: 2025-11-01T17:09:52.417854Z INFO lact_daemon::server::handler: GPU list reloaded with 1 devices, reapplying configuration
Nov 01 17:09:52 Gilgamesh lact[2950]: 2025-11-01T17:09:52.417889Z INFO lact_daemon::server::handler: configuration applied
What I have tried
- Changing Proton version (Using Experimental and various GE versions)
- Lowering main monitor refresh rate (usually 180hz)
- Setting both monitors to the same refresh rate (I initially thought this crash was caused by VRR)
- Disabling VRR completely
- Capping FPS below refresh rate (Capped to 120fps at 144hz)
- Adding “RADV_DEBUG=nodcc %command%” to Steam launch command
- Disabling “Direct Scan-Out” (Tested on Bazzite)
- Disabling “AMDGPU runtime‑power‑management” (tested on Bazzite)
- Tested different Kernels on Bazzite (Can’t remember exact versions but tested both Bazzite 42 and 43 which was kernel 6.16 to 6.17 I believe)
- Tested different Kernels on Garuda (linux-zen 6.17.6.zen1-1 and linux-mainline 6.18rc3-1)
I still crashed in all of the above scenarios
I have also attempted to rule out hardware as follows:
- Temperatures seem fine. Tested on both Windows and Linux. CPU peaked at 82 degrees but averaged around 65 degrees. GPU peaked at 70 degrees but also averaged around 65 (The junction / hotspot temperature on GPU was at 85 with occasional dips lower but never higher)
- I have never crashed in Windows on this PC
- I installed Opensuse Tumbleweed and left Cities Skylines 2 running overnight and through the work day. It ran for 19.5 hours with no crash (The longest Garuda has gone without a crash is 4 hours)
Help Me
I know I have said above that Tumbleweed was stable, and if I can’t resolve the crash I will swap to it for stability. However Garuda has been such a nice experience. Out of all the distros I’ve tried, I really would love to make it my main OS.
I welcome any suggestions. I’ll answer what I can and try whatever is suggested!
Thanks for your time.
lee@Gilgamesh in ~
╰─λ garuda-inxi
System:
Kernel: 6.17.6-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 15.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=7e947cb6-e2b8-415b-a232-aa8202323db5 rw rootflags=subvol=@
quiet loglevel=3
Desktop: KDE Plasma v: 6.5.1 tk: Qt v: N/A info: frameworks v: 6.19.0
wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
Type: Desktop System: ASUS product: N/A v: N/A serial: <superuser required>
Mobo: ASUSTeK model: PRIME X870-P v: Rev 1.xx serial: <superuser required>
part-nu: SKU uuid: <superuser required> UEFI: American Megatrends v: 1078
date: 07/14/2025
CPU:
Info: model: AMD Ryzen 7 9800X3D bits: 64 type: MT MCP arch: Zen 5 gen: 5
level: v4 note: check built: 2024+ process: TSMC n4 (4nm) family: 0x1A (26)
model-id: 0x44 (68) stepping: 0 microcode: 0xB404032
Topology: cpus: 1x dies: 1 clusters: 1 cores: 8 threads: 16 tpc: 2
smt: enabled cache: L1: 640 KiB desc: d-8x48 KiB; i-8x32 KiB L2: 8 MiB
desc: 8x1024 KiB L3: 96 MiB desc: 1x96 MiB
Speed (MHz): avg: 4501 min/max: 603/5272 boost: enabled scaling:
driver: amd-pstate-epp governor: performance cores: 1: 4501 2: 4501 3: 4501
4: 4501 5: 4501 6: 4501 7: 4501 8: 4501 9: 4501 10: 4501 11: 4501 12: 4501
13: 4501 14: 4501 15: 4501 16: 4501 bogomips: 150399
Flags-basic: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 48 [Radeon RX 9070/9070
XT/9070 GRE] vendor: Tul / PowerColor Reaper driver: amdgpu v: kernel
arch: RDNA-4 code: Navi-4x process: TSMC n4 (4nm) built: 2025+ pcie:
gen: 5 speed: 32 GT/s lanes: 16 ports: active: DP-1,DP-2
empty: DP-3,HDMI-A-1,Writeback-1 bus-ID: 03:00.0 chip-ID: 1002:7550
class-ID: 0300
Display: wayland server: X.org v: 1.21.1.20 with: Xwayland v: 24.1.9
compositor: kwin_wayland driver: X: loaded: modesetting
alternate: fbdev,vesa dri: radeonsi gpu: amdgpu d-rect: 5120x2880
display-ID: 0
Monitor-1: DP-1 pos: bottom-r model: Acer XB273U V3 serial: <filter>
built: 2023 res: mode: 2560x1440 hz: 144 scale: 100% (1) dpi: 109 gamma: 1.2
size: 597x336mm (23.5x13.23") diag: 685mm (27") ratio: 16:9 modes:
max: 2560x1440 min: 720x400
Monitor-2: DP-2 pos: primary,top-left model: Acer XB271HU serial: <filter>
built: 2018 res: mode: 2560x1440 hz: 144 scale: 178% (1.78) to: 1440x2560
dpi: 109 gamma: 1.2 size: 598x336mm (23.54x13.23") diag: 686mm (27")
ratio: 16:9 modes: max: 2560x1440 min: 640x480
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi
wayland: drv: radeonsi x11: drv: radeonsi
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.2.6-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 9070 XT (radeonsi
gfx1201 LLVM 21.1.4 DRM 3.64 6.17.6-zen1-1-zen) device-ID: 1002:7550
memory: 15.62 GiB unified: no display-ID: :1.0
API: Vulkan v: 1.4.328 layers: 13 device: 0 type: discrete-gpu name: AMD
Radeon RX 9070 XT (RADV GFX1201) driver: mesa radv v: 25.2.6-arch1.1
device-ID: 1002:7550 surfaces: N/A device: 1 type: cpu name: llvmpipe
(LLVM 21.1.4 256 bits) driver: mesa llvmpipe v: 25.2.6-arch1.1 (LLVM
21.1.4) device-ID: 10005:0000 surfaces: N/A
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
de: kscreen-console,kscreen-doctor gpu: corectrl,lact wl: wayland-info
x11: xdpyinfo, xprop, xrandr
Audio:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 48 HDMI/DP Audio
driver: snd_hda_intel v: kernel pcie: gen: 5 speed: 32 GT/s lanes: 16
bus-ID: 03:00.1 chip-ID: 1002:ab40 class-ID: 0403
Device-2: Advanced Micro Devices [AMD] Family 17h/19h/1ah HD Audio
vendor: ASUSTeK driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s
lanes: 16 bus-ID: 75:00.6 chip-ID: 1022:15e3 class-ID: 0403
Device-3: SteelSeries ApS Arctis Nova 7
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 2.0 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 7-1:2 chip-ID: 1038:2202 class-ID: 0300
API: ALSA v: k6.17.6-zen1-1-zen status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 1.4.9 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: ASUSTeK driver: r8169 v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 1 port: e000 bus-ID: 08:00.0
chip-ID: 10ec:8125 class-ID: 0200
IF: enp8s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Info: services: NetworkManager, smbd, systemd-timesyncd
Bluetooth:
Device-1: Realtek Bluetooth Radio driver: btusb v: 0.8 type: USB rev: 1.1
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-8:3 chip-ID: 2b89:8761
class-ID: e001 serial: <filter>
Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.1
lmp-v: 10 status: discoverable: no pairing: no class-ID: 6c0104
Drives:
Local Storage: total: 4.55 TiB used: 156.54 GiB (3.4%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:2 vendor: Samsung
model: SSD 990 EVO Plus 1TB size: 931.51 GiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 2B2QKXG7 temp: 39.9 C scheme: GPT
ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Samsung
model: SSD 990 EVO Plus 1TB size: 931.51 GiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 2B2QKXG7 temp: 34.9 C scheme: GPT
ID-3: /dev/nvme2n1 maj-min: 259:1 vendor: Samsung
model: SSD 990 EVO Plus 1TB size: 931.51 GiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 2B2QKXG7 temp: 32.9 C scheme: GPT
ID-4: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 QVO 2TB
size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 2B6Q scheme: GPT
Partition:
ID-1: / raw-size: 305.29 GiB size: 305.29 GiB (100.00%)
used: 156.53 GiB (51.3%) fs: btrfs dev: /dev/nvme2n1p4 maj-min: 259:10
ID-2: /boot/efi raw-size: 600 MiB size: 598.8 MiB (99.80%)
used: 16.4 MiB (2.7%) fs: vfat dev: /dev/nvme2n1p1 maj-min: 259:7
ID-3: /home raw-size: 305.29 GiB size: 305.29 GiB (100.00%)
used: 156.53 GiB (51.3%) fs: btrfs dev: /dev/nvme2n1p4 maj-min: 259:10
ID-4: /var/log raw-size: 305.29 GiB size: 305.29 GiB (100.00%)
used: 156.53 GiB (51.3%) fs: btrfs dev: /dev/nvme2n1p4 maj-min: 259:10
ID-5: /var/tmp raw-size: 305.29 GiB size: 305.29 GiB (100.00%)
used: 156.53 GiB (51.3%) fs: btrfs dev: /dev/nvme2n1p4 maj-min: 259:10
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 30.98 GiB used: 0 KiB (0.0%) priority: 100
comp: zstd avail: lzo-rle,lzo,lz4,lz4hc,deflate,842 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 51.0 C mobo: N/A gpu: amdgpu temp: 55.0 C
mem: 58.0 C
Fan Speeds (rpm): N/A gpu: amdgpu fan: 0
Info:
Memory: total: 32 GiB note: est. available: 30.98 GiB used: 5.53 GiB (17.8%)
Processes: 449 Power: uptime: 16m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 12.36 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 258 default: graphical
tool: systemctl
Packages: pm: pacman pkgs: 1857 libs: 546 tools: octopi,paru Compilers:
clang: 21.1.4 gcc: 15.2.1 Shell: Bash v: 5.3.3 default: fish v: 4.1.2
running-in: konsole inxi: 3.3.39
Garuda (2.11.1-1):
System install date: 2025-10-31
Garuda release: 251002
Last full system update: 2025-11-01 ↻
Is partially upgraded: No
Relevant software: snapper NetworkManager dracut garuda-hardware-profile-standard
Windows dual boot: Probably (Run as root to verify)
Failed units:
--- System Health Check Report ---
24/25 checks run in 0.57 seconds ⌛
Powered by garuda-health 🦅
--- LOW ---
- Deprecated/Outdated/Removed packages should be removed: steam-native-runtime (fix available)
--- INFO ---
- A reboot is pending (update applied since last reboot)
Run garuda-health --fix to apply fixes.
