System freezes, unable to kill a process or shutdown properly

Hello,

For the last month or two, I have been getting freezes on a regular basis. They happen in games for the most part, some games (more performance demanding like CS2) more often than others. However, it does sometimes happen outside of games as well, for example when doing copy/paste on large (>50Gb) files. Rarely, but sometimes too, during completly basic usage like browsing web or editing text.
Usually, without starting a game, I can use the PC for many hours with no issues, and then they happen within first 30 minutes of playing.

Once a game/process freeze occurs, there are multiple possible outcomes:

  1. The least likely one: The process crashes and ends correctly. In this situation I can still normally use the PC or even restart the game. This is followed by a core dump like below.
Summary
09:32:13 user-pc systemd-coredump[61073]: [šŸ”•] Process 60817 (cs2) of user 1000 dumped core.
                                                 
                                                 Stack trace of thread 60825:
                                                 #0  0x00007abfaf834545 n/a (n/a + 0x0)
                                                 #1  0x00007abfaf7a0b03 n/a (n/a + 0x0)
                                                 #2  0x00007abfaf7b5d7e n/a (n/a + 0x0)
                                                 #3  0x00007abfaf7b6658 n/a (n/a + 0x0)
                                                 #4  0x00007ac0dcf204ed n/a (n/a + 0x0)
                                                 #5  0x00007ac0dcf20608 n/a (n/a + 0x0)
                                                 #6  0x00007ac0dcf213a5 n/a (n/a + 0x0)
                                                 #7  0x00007ac0dcfc3783 n/a (n/a + 0x0)
                                                 #8  0x00007ac0dcfc4a87 n/a (n/a + 0x0)
                                                 #9  0x00007ac0deeae70a n/a (n/a + 0x0)
                                                 #10 0x00007ac0def32aac n/a (n/a + 0x0)
                                                 ELF object binary architecture: AMD x86-64
09:32:13 user-pc systemd[1]: systemd-coredump@2-61071-0.service: Deactivated successfully.
09:32:13 user-pc systemd[1]: systemd-coredump@2-61071-0.service: Consumed 13.647s CPU time, 5.7G memory peak.
09:32:14 user-pc steam[42220]: /home/user/.local/share/Steam/steamapps/common/Counter-Strike Global Offensive/game/cs2.sh: line 98: 60817 Illegal instruction     (core dumped) $>
09:32:14 user-pc steam[42220]: Game Recording - game stopped [gameid=730]
09:32:14 user-pc steam[42220]: Removing process 61069 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60819 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60818 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60817 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60816 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60814 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60812 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60706 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60705 for gameID 730
09:32:14 user-pc steam[42220]: Removing process 60704 for gameID 730
09:32:19 user-pc steam[42220]: (process:60886): GLib-GObject-CRITICAL **: 09:32:19.481: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
  1. Most likely one. The game process freezes and Iā€™m unable to kill it. Not through terminal, steam (if itā€™s a steam game), system monitor. If I try to shutdown or reboot the PC, it doesnā€™t work, because the shutdown process gets stuck trying to kill the frozen application. I can however usually minimize the frozen game and continue to use to PC somewhat normally, though a system-wide freeze often follows within an hour or so. When this outcome occurs, itā€™s usually followed by a stacktrace like this in the journal:
Summary
23:12:44 user-pc kernel: ------------[ cut here ]------------
23:12:44 user-pc kernel: !se->sched_delayed
23:12:44 user-pc kernel: WARNING: CPU: 11 PID: 0 at kernel/sched/fair.c:6866 requeue_delayed_entity+0x4e5/0x>
23:12:44 user-pc kernel: Modules linked in: exfat snd_seq_dummy rfcomm snd_hrtimer snd_seq ext4 mbcache jbd2>
23:12:44 user-pc kernel:  libarc4 xpad bluetooth mc joydev mousedev kvm snd_hda_codec ff_memless crc16 iwlwi>
23:12:44 user-pc kernel:  crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod i2c_dev crypto_us>
23:12:44 user-pc kernel: CPU: 11 UID: 0 PID: 0 Comm: swapper/11 Tainted: P        W  OE      6.13.2-zen1-1-z>
23:12:44 user-pc kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
23:12:44 user-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A WIFI (MS-7D25)>
23:12:44 user-pc kernel: RIP: 0010:requeue_delayed_entity+0x4e5/0x520
23:12:44 user-pc kernel: Code: 18 f9 ff 0f 0b e9 64 fb ff ff 80 3d 54 59 28 02 00 0f 85 4d fb ff ff 48 c7 c7>
23:12:44 user-pc kernel: RSP: 0018:ffffb00080217df8 EFLAGS: 00010086
23:12:44 user-pc kernel: RAX: 0000000000000000 RBX: ffff9e4f76c08080 RCX: 0000000000000027
23:12:44 user-pc kernel: RDX: ffff9e531f7a18c8 RSI: 0000000000000001 RDI: ffff9e531f7a18c0
23:12:44 user-pc kernel: RBP: ffff9e4e8b4b4600 R08: ffff9e533ff543e8 R09: 00000000ffffdfff
23:12:44 user-pc kernel: R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000009
23:12:44 user-pc kernel: R13: 0000000000000000 R14: ffff9e4f76c08000 R15: ffff9e531f7b6a80
23:12:44 user-pc kernel: FS:  0000000000000000(0000) GS:ffff9e531f780000(0000) knlGS:0000000000000000
23:12:44 user-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
23:12:44 user-pc kernel: CR2: 0000183402f68000 CR3: 00000004b6d9a002 CR4: 0000000000f70ef0
23:12:44 user-pc kernel: PKRU: 55555554
23:12:44 user-pc kernel: Call Trace:
23:12:44 user-pc kernel:  <TASK>
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4e5/0x520
23:12:44 user-pc kernel:  ? __warn.cold+0x93/0xed
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4e5/0x520
23:12:44 user-pc kernel:  ? report_bug+0xe7/0x210
23:12:44 user-pc kernel:  ? handle_bug+0x58/0x90
23:12:44 user-pc kernel:  ? exc_invalid_op+0x19/0xc0
23:12:44 user-pc kernel:  ? asm_exc_invalid_op+0x1a/0x20
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4e5/0x520
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4e5/0x520
23:12:44 user-pc kernel:  enqueue_task+0x39/0x5d0
23:12:44 user-pc kernel:  ? update_rq_clock+0x37/0x230
23:12:44 user-pc kernel:  sched_ttwu_pending+0xda/0x3c0
23:12:44 user-pc kernel:  __flush_smp_call_function_queue+0x15b/0x410
23:12:44 user-pc kernel:  ? ktime_get+0x3a/0xd0
23:12:44 user-pc kernel:  flush_smp_call_function_queue+0x37/0x70
23:12:44 user-pc kernel:  do_idle+0x143/0x210
23:12:44 user-pc kernel:  cpu_startup_entry+0x29/0x30
23:12:44 user-pc kernel:  start_secondary+0x11e/0x140
23:12:44 user-pc kernel:  common_startup_64+0x13e/0x141
23:12:44 user-pc kernel:  </TASK>
23:12:44 user-pc kernel: ---[ end trace 0000000000000000 ]---
23:12:44 user-pc kernel: ------------[ cut here ]------------
23:12:44 user-pc kernel: !se->on_rq
23:12:44 user-pc kernel: WARNING: CPU: 11 PID: 0 at kernel/sched/fair.c:6867 requeue_delayed_entity+0x4be/0x>
23:12:44 user-pc kernel: Modules linked in: exfat snd_seq_dummy rfcomm snd_hrtimer snd_seq ext4 mbcache jbd2>
23:12:44 user-pc kernel:  libarc4 xpad bluetooth mc joydev mousedev kvm snd_hda_codec ff_memless crc16 iwlwi>
23:12:44 user-pc kernel:  crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod i2c_dev crypto_us>
23:12:44 user-pc kernel: CPU: 11 UID: 0 PID: 0 Comm: swapper/11 Tainted: P        W  OE      6.13.2-zen1-1-z>
23:12:44 user-pc kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
23:12:44 user-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A WIFI (MS-7D25)>
23:12:44 user-pc kernel: RIP: 0010:requeue_delayed_entity+0x4be/0x520
23:12:44 user-pc kernel: Code: a5 88 01 00 00 e9 16 fc ff ff 80 3d 7a 59 28 02 00 0f 85 7e fb ff ff 48 c7 c7>
23:12:44 user-pc kernel: RSP: 0018:ffffb00080217df8 EFLAGS: 00010086
23:12:44 user-pc kernel: RAX: 0000000000000000 RBX: ffff9e4f76c08080 RCX: 0000000000000027
23:12:44 user-pc kernel: RDX: ffff9e531f7a18c8 RSI: 0000000000000001 RDI: ffff9e531f7a18c0
23:12:44 user-pc kernel: RBP: ffff9e4e8b4b4600 R08: ffff9e533ff543e8 R09: 00000000ffffdfff
23:12:44 user-pc kernel: R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000009
23:12:44 user-pc kernel: R13: 0000000000000000 R14: ffff9e4f76c08000 R15: ffff9e531f7b6a80
23:12:44 user-pc kernel: FS:  0000000000000000(0000) GS:ffff9e531f780000(0000) knlGS:0000000000000000
23:12:44 user-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
23:12:44 user-pc kernel: CR2: 0000183402f68000 CR3: 00000004b6d9a002 CR4: 0000000000f70ef0
23:12:44 user-pc kernel: PKRU: 55555554
23:12:44 user-pc kernel: Call Trace:
23:12:44 user-pc kernel:  <TASK>
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4be/0x520
23:12:44 user-pc kernel:  ? __warn.cold+0x93/0xed
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4be/0x520
23:12:44 user-pc kernel:  ? report_bug+0xe7/0x210
23:12:44 user-pc kernel:  ? handle_bug+0x58/0x90
23:12:44 user-pc kernel:  ? exc_invalid_op+0x19/0xc0
23:12:44 user-pc kernel:  ? asm_exc_invalid_op+0x1a/0x20
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4be/0x520
23:12:44 user-pc kernel:  ? requeue_delayed_entity+0x4be/0x520
23:12:44 user-pc kernel:  enqueue_task+0x39/0x5d0
23:12:44 user-pc kernel:  ? update_rq_clock+0x37/0x230
23:12:44 user-pc kernel:  sched_ttwu_pending+0xda/0x3c0
23:12:44 user-pc kernel:  __flush_smp_call_function_queue+0x15b/0x410
23:12:44 user-pc kernel:  ? ktime_get+0x3a/0xd0
23:12:44 user-pc kernel:  flush_smp_call_function_queue+0x37/0x70
23:12:44 user-pc kernel:  do_idle+0x143/0x210
23:12:44 user-pc kernel:  cpu_startup_entry+0x29/0x30
23:12:44 user-pc kernel:  start_secondary+0x11e/0x140
23:12:44 user-pc kernel:  common_startup_64+0x13e/0x141
23:12:44 user-pc kernel:  </TASK>
23:12:44 user-pc kernel: ---[ end trace 0000000000000000 ]---
23:12:44 user-pc kernel: ------------[ cut here ]------------
23:12:44 user-pc kernel: !se->on_rq
23:12:44 user-pc kernel: WARNING: CPU: 11 PID: 0 at kernel/sched/fair.c:709 update_entity_lag+0x16a/0x1c0
23:12:44 user-pc kernel: Modules linked in: exfat snd_seq_dummy rfcomm snd_hrtimer snd_seq ext4 mbcache jbd2>
23:12:44 user-pc kernel:  libarc4 xpad bluetooth mc joydev mousedev kvm snd_hda_codec ff_memless crc16 iwlwi>
23:12:44 user-pc kernel:  crc32c_intel xor raid6_pq dm_mirror dm_region_hash dm_log dm_mod i2c_dev crypto_us>
23:12:44 user-pc kernel: CPU: 11 UID: 0 PID: 0 Comm: swapper/11 Tainted: P        W  OE      6.13.2-zen1-1-z>
23:12:44 user-pc kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
23:12:44 user-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A WIFI (MS-7D25)>
23:12:44 user-pc kernel: RIP: 0010:update_entity_lag+0x16a/0x1c0
23:12:44 user-pc kernel: Code: c1 01 48 d3 ea 41 29 c8 eb ae 80 3d 60 7b 28 02 00 0f 85 c9 fe ff ff 48 c7 c7>
23:12:44 user-pc kernel: RSP: 0018:ffffb00080217dd0 EFLAGS: 00010082
23:12:44 user-pc kernel: RAX: 0000000000000000 RBX: ffff9e4e8b4b4600 RCX: 0000000000000027
23:12:44 user-pc kernel: RDX: ffff9e531f7a18c8 RSI: 0000000000000001 RDI: ffff9e531f7a18c0
23:12:44 user-pc kernel: RBP: ffffb00080217de8 R08: ffff9e533ff543e8 R09: 00000000ffffdfff
23:12:44 user-pc kernel: R10: 0000000000000003 R11: 0000000000000001 R12: ffff9e4f76c08080
23:12:44 user-pc kernel: R13: 0000000000000000 R14: ffff9e4f76c08000 R15: ffff9e531f7b6a80
23:12:44 user-pc kernel: FS:  0000000000000000(0000) GS:ffff9e531f780000(0000) knlGS:0000000000000000
23:12:44 user-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
23:12:44 user-pc kernel: CR2: 0000183402f68000 CR3: 00000004b6d9a002 CR4: 0000000000f70ef0
23:12:44 user-pc kernel: PKRU: 55555554
23:12:44 user-pc kernel: Call Trace:
23:12:44 user-pc kernel:  <TASK>
23:12:44 user-pc kernel:  ? update_entity_lag+0x16a/0x1c0
23:12:44 user-pc kernel:  ? __warn.cold+0x93/0xed
23:12:44 user-pc kernel:  ? update_entity_lag+0x16a/0x1c0
23:12:44 user-pc kernel:  ? report_bug+0xe7/0x210
23:12:44 user-pc kernel:  ? handle_bug+0x58/0x90
23:12:44 user-pc kernel:  ? exc_invalid_op+0x19/0xc0
23:12:44 user-pc kernel:  ? asm_exc_invalid_op+0x1a/0x20
23:12:44 user-pc kernel:  ? update_entity_lag+0x16a/0x1c0
23:12:44 user-pc kernel:  requeue_delayed_entity+0x36/0x520
23:12:44 user-pc kernel:  enqueue_task+0x39/0x5d0
23:12:44 user-pc kernel:  ? update_rq_clock+0x37/0x230
23:12:44 user-pc kernel:  sched_ttwu_pending+0xda/0x3c0
23:12:44 user-pc kernel:  __flush_smp_call_function_queue+0x15b/0x410
23:12:44 user-pc kernel:  ? ktime_get+0x3a/0xd0
23:12:44 user-pc kernel:  flush_smp_call_function_queue+0x37/0x70
23:12:44 user-pc kernel:  do_idle+0x143/0x210
23:12:44 user-pc kernel:  cpu_startup_entry+0x29/0x30
23:12:44 user-pc kernel:  start_secondary+0x11e/0x140
23:12:44 user-pc kernel:  common_startup_64+0x13e/0x141
23:12:44 user-pc kernel:  </TASK>
23:12:44 user-pc kernel: ---[ end trace 0000000000000000 ]---
  1. The entire system freezes. Iā€™m unable to move the cursor, switch to different tty, the audio usually shuts off and the PC becomes unresponsive until reboot with a power button.
    This happens sometimes, not rarely, but less often than the second one. More often than not it happens after one of the first two cases happens and I donā€™t reboot. Though this can be a coincidence, bc the issues happen constantly if Iā€™m gaming.

What Iā€™ve tried so far:

  • Different Nvidia drivers (main branch, lts, zen, nouveau)
  • Reinstalling the system
  • Switching between wayland/X11
  • Memtest86, stress-ng, SMART tests of all drives, memtest vulkan. None of those caused a freeze or found anything wrong.
  • Different kernels: zen, xanmod, lts, cachy, clear
  • Updating BIOS
  • Enabling and disabling undervolting CPU (was doing it for >1 year when the issue started)
  • Monitoring temperatures, they run fairly high, but not higher than they used to without problems

garuda-inxi output:

System:
  Kernel: 6.13.4-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=235255d0-48a9-43db-a5f3-5bc997ed1c13 rw rootflags=subvol=@
    quiet rd.luks.uuid=8cc96cc2-42a6-4148-9ae7-19b2563cf99c
    rd.luks.uuid=2ac9b37d-5ad8-47c6-86ee-24f776a7b48c
    resume=/dev/mapper/luks-2ac9b37d-5ad8-47c6-86ee-24f776a7b48c loglevel=3
    ibt=off
  Desktop: KDE Plasma v: 6.3.1 tk: Qt v: N/A info: frameworks v: 6.11.0
    wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
  Type: Desktop Mobo: Micro-Star model: PRO Z690-A WIFI (MS-7D25) v: 2.0
    serial: <superuser required> uuid: <superuser required> UEFI: American
    Megatrends LLC. v: A.K0 date: 10/21/2024
CPU:
  Info: model: 13th Gen Intel Core i7-13700KF bits: 64 type: MST AMCP
    arch: Raptor Lake gen: core 13 level: v3 note: check built: 2022+
    process: Intel 7 (10nm) family: 6 model-id: 0xB7 (183) stepping: 1
    microcode: 0x12C
  Topology: cpus: 1x dies: 1 clusters: 10 cores: 16 threads: 24 mt: 8 tpc: 2
    st: 8 smt: enabled cache: L1: 1.4 MiB desc: d-8x32 KiB, 8x48 KiB; i-8x32
    KiB, 8x64 KiB L2: 24 MiB desc: 8x2 MiB, 2x4 MiB L3: 30 MiB desc: 1x30 MiB
  Speed (MHz): avg: 980 min/max: 800/5300:5400:4200 scaling:
    driver: intel_pstate governor: powersave cores: 1: 980 2: 980 3: 980 4: 980
    5: 980 6: 980 7: 980 8: 980 9: 980 10: 980 11: 980 12: 980 13: 980 14: 980
    15: 980 16: 980 17: 980 18: 980 19: 980 20: 980 21: 980 22: 980 23: 980
    24: 980 bogomips: 164044
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities: <filter>
Graphics:
  Device-1: NVIDIA GA102 [GeForce RTX 3090] vendor: ZOTAC driver: nvidia
    v: 570.86.16 alternate: nouveau,nvidia_drm non-free: 550/565.xx+
    status: current (as of 2025-01; EOL~2026-12-xx) arch: Ampere code: GAxxx
    process: TSMC n7 (7nm) built: 2020-2023 pcie: gen: 4 speed: 16 GT/s
    lanes: 16 ports: active: none off: DP-1,HDMI-A-1 empty: DP-2,DP-3
    bus-ID: 01:00.0 chip-ID: 10de:2204 class-ID: 0300
  Display: wayland server: X.org v: 1.21.1.15 with: Xwayland v: 24.1.5
    compositors: 1: kwin_wayland 2: Tabby driver: X: loaded: nvidia
    unloaded: modesetting,nouveau alternate: fbdev,nv,vesa
    gpu: nvidia,nvidia-nvswitch d-rect: 3840x1080 display-ID: 0
  Monitor-1: DP-1 pos: right model: ViewSonic VX2458-mhd serial: <filter>
    built: 2020 res: mode: 1920x1080 hz: 144 scale: 100% (1) dpi: 94 gamma: 1.2
    size: 521x293mm (20.51x11.54") diag: 598mm (23.5") ratio: 16:9 modes:
    max: 1920x1080 min: 640x480
  Monitor-2: HDMI-A-1 pos: primary,left model: Dell P2417H serial: <filter>
    built: 2017 res: mode: 1920x1080 hz: 60 scale: 100% (1) dpi: 93 gamma: 1.2
    size: 527x296mm (20.75x11.65") diag: 604mm (23.8") ratio: 16:9 modes:
    max: 1920x1080 min: 640x480
  API: EGL v: 1.5 hw: drv: nvidia platforms: device: 0 drv: nvidia gbm:
    drv: nvidia surfaceless: drv: nvidia wayland: drv: nvidia x11: drv: nvidia
  API: OpenGL v: 4.6.0 vendor: nvidia v: 570.86.16 glx-v: 1.4
    direct-render: yes renderer: NVIDIA GeForce RTX 3090/PCIe/SSE2
    memory: 23.44 GiB display-ID: :1.0
  API: Vulkan v: 1.4.303 layers: 13 device: 0 type: discrete-gpu
    name: NVIDIA GeForce RTX 3090 driver: N/A device-ID: 10de:2204
    surfaces: xcb,xlib,wayland device: 1 type: cpu name: llvmpipe (LLVM
    19.1.7 256 bits) driver: N/A device-ID: 10005:0000
    surfaces: xcb,xlib,wayland
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor gpu: nvidia-settings,nvidia-smi
    wl: wayland-info x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Intel Alder Lake-S HD Audio vendor: Micro-Star MSI
    driver: snd_hda_intel v: kernel alternate: snd_soc_avs,snd_sof_pci_intel_tgl
    bus-ID: 00:1f.3 chip-ID: 8086:7ad0 class-ID: 0403
  Device-2: NVIDIA GA102 High Definition Audio vendor: ZOTAC
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:1aef class-ID: 0403
  Device-3: Razer USA Barracuda 2.4 driver: hid-generic,snd-usb-audio,usbhid
    type: USB rev: 1.1 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-1:2
    chip-ID: 1532:053c class-ID: 0301 serial: <filter>
  API: ALSA v: k6.13.4-zen1-1-zen status: kernel-api with: aoss
    type: oss-emulator tools: N/A
  Server-1: PipeWire v: 1.2.7 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Intel Alder Lake-S PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3 chip-ID: 8086:7af0 class-ID: 0280
  IF: wlo1 state: down mac: <filter>
  Device-2: Intel Ethernet I225-V vendor: Micro-Star MSI driver: igc
    v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: N/A bus-ID: 07:00.0
    chip-ID: 8086:15f3 class-ID: 0200
  IF: enp7s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-3: Microsoft Xbox 360 Wireless Adapter driver: xpad type: USB
    rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-11:5 chip-ID: 045e:0719
    class-ID: ff00 serial: <filter>
  Info: services: NetworkManager,systemd-timesyncd
Bluetooth:
  Device-1: Intel AX211 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:9 chip-ID: 8087:0033
    class-ID: e001
  Report: btmgmt ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: <filter> bt-v: 5.3 lmp-v: 12
    status: discoverable: no pairing: no
Drives:
  Local Storage: total: 4.09 TiB used: 2.25 TiB (55.0%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 PRO with
    Heatsink 1TB size: 931.51 GiB block-size: physical: 512 B logical: 512 B
    speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter> fw-rev: 4B2QGXA7
    temp: 39.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:10 vendor: Samsung model: SSD 980 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 32.9 C
  ID-3: /dev/nvme2n1 maj-min: 259:4 vendor: Kingston model: SNV2S2000G
    size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: SBK00104 temp: 53.9 C
    scheme: GPT
  ID-4: /dev/nvme3n1 maj-min: 259:5 vendor: Samsung model: SSD 980 500GB
    size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 38.9 C
    scheme: GPT
Partition:
  ID-1: / raw-size: 896.91 GiB size: 896.91 GiB (100.00%)
    used: 176.58 GiB (19.7%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-8cc96cc2-42a6-4148-9ae7-19b2563cf99c
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 768 KiB (0.3%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 896.91 GiB size: 896.91 GiB (100.00%)
    used: 176.58 GiB (19.7%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-8cc96cc2-42a6-4148-9ae7-19b2563cf99c
  ID-4: /var/log raw-size: 896.91 GiB size: 896.91 GiB (100.00%)
    used: 176.58 GiB (19.7%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-8cc96cc2-42a6-4148-9ae7-19b2563cf99c
  ID-5: /var/tmp raw-size: 896.91 GiB size: 896.91 GiB (100.00%)
    used: 176.58 GiB (19.7%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-8cc96cc2-42a6-4148-9ae7-19b2563cf99c
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: partition size: 34.31 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/dm-1 maj-min: 254:1
    mapped: luks-2ac9b37d-5ad8-47c6-86ee-24f776a7b48c
  ID-2: swap-2 type: zram size: 31.19 GiB used: 0 KiB (0.0%) priority: 100
    comp: zstd avail: lzo-rle,lzo,lz4,lz4hc,deflate,842 max-streams: 24
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 39.8 C mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 32 GiB available: 31.19 GiB used: 4.33 GiB (13.9%)
  Processes: 557 Power: uptime: 3m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 12.41 GiB services: org_kde_powerdevil,
    power-profiles-daemon, upowerd Init: systemd v: 257 default: graphical
    tool: systemctl
  Packages: 2463 pm: pacman pkgs: 2447 libs: 610 tools: octopi,paru
    pm: flatpak pkgs: 16 Compilers: clang: 19.1.7 gcc: 14.2.1 Shell: garuda-inxi
    default: Bash v: 5.2.37 running-in: tabby inxi: 3.3.37
Garuda (2.6.26-1.1):
  System install date:     2025-02-12
  Last full system update: 2025-02-24
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager dracut
  Windows dual boot:       No/Undetected
  Failed units:  

Iā€™ve looked through the Internet, Iā€™ve found people having similar issues on different hardware, different linux distros. However, none of their solutions helped in my case. Iā€™ve mostly looked for examples related to CS, since itā€™s where the issues started for me and itā€™s definitely the worst for it.
Iā€™ve also looked through the Garuda forum, but havenā€™t found issues analogous to mine.
Examples:

Wow, good post. :clap:

The main problem, to solve it, is the ā€œrandomā€ part.

The only other thing I can think of is faulty hardware, regardless of the manufacturer of the CPU/GPU, and I donā€™t technically know if that can even happen anymore, are ā€˜cold solder jointsā€™ or contacts that come loose from vibration or whatever.
If you moved while playing :wink: , did a truck drive past? :scream:

4 Likes

My first suspicion falls on this candidate:

Disable all OC settings in the operating system AND in the BIOS. Your BIOS is up to date, but check whether the ā€œIntel baseline profileā€ for the CPU is activated in the BIOS.
You can try to increase the vcore minimally.
The CPU may be dying slowlyā€¦

Does your power supply have enough power for the installed hardware? Manufacturer and model of the power supply?

5 Likes

very nice post =)

i would say also like @nepti maybe your cpu is going to dead?
i have also CS 2 on Steam installed and played for hours nothing causes issues.
i know there is some problems with games and dropping FPS did you have steam overlay activ?i have it active because of FPS drops input laggings what we had in this forum allready solved. For those without gaming freezings my friend run intel cpu overclocked too and the cpu dies slowly after 6 month.he had luck to blame it to intel and got a new one =)

How many memory sticks do you have in the system?
Did you try to disable XMP?

1 Like

has intel xmp too? I have amd and it use xmp but i dont use it too.i had a while ago intel with dell lappi but they called xmp otherwise on intel if i remembered right. Just a question =)

1 Like

Do you have ā€˜Garuda Assistant > Settings > Performance Tweaksā€™ enabled? I was having similar problems where I would softlock once a day. Turned that off about a month ago and havent had a problem since.

3 Likes

EXPO, XMP, its all the same different naming :wink:

2 Likes

Are you overclocking, or overheating your CPU? For this can cause your kind of troubles gaming.

Which games are effected, and which not? Is there a trend visible there? Like CPU intensive games cause more often troubles than games which do not use the CPU all that much.

A faulty ram stick can also cause this kind of trouble. Have you tried running on just one memory stick, and see if the crashing still happens, and if so try the others, one by one if need be, to see if that fixes it?

Finally have you tried reducing the core speed with cpupower-gui? i found this can also help stop certain games or appliances from crashing the machine. For example one one of my laptops Steam doing the shaders would overheat the machine, and often crash it when it took a long time. So i wrote a little script to reducing cpu frequency when the shaders are compiling, and put the cpu back to normal again afterwards. cpupower-gui is a perfect app for doing this.

Any how hope this helps.

2 Likes

Thanks for all the suggestions and support!
Iā€™ll try to reply to everyone and provide an update, since in the last 24 hours Iā€™ve tried some things and the problem got a lot better.

I also believe this is most likely the source of the issue. Itā€™s the unlucky 13th gen, so Iā€™m sort of expecting it to fail sooner than normal. The issue has been getting progressively worse until now, so it definitively seems like like itā€™s dying, not even at that slow of a pace.
I have been undervolting it in an attempt to curb the overheating and extend itā€™s lifespan, at least until the microcode update last year. I think the BIOS update might have reset my initial config back to default MSI values. I wouldnā€™t be surprised if that entailed some overclocking.
I got the CPU second hand 1.5 years ago, so itā€™s not really new and who knows what it went through beforehand.

However,

This helped a lot. I went into the BIOS, but I couldnā€™t find this exact setting unfortunately. Iā€™m not too familiar with overclocking tbh, so the options there seem a bit overwhelming.
But I noticed that ā€œCPU Lite Load Controlā€ was set to Normal, which Iā€™ve changed to ā€œIntel Defaultā€.
Iā€™ve also set CPU Loadline Calibration Control to Mode 5 and enabled BCLK 100MHz Lock.

This config resulted in a significant improvement. I had zero issues yesterday after making these changes, having run the game for around 6 hours, while Iā€™d typically get a freeze/crash in the first 30 minutes.
Today so far I got a single crash (running the game as a benchmark for most of the day), simply closing the game without affecting the system, logs point to a segmentation fault.

Log
13:45:30 user-pc steam[4881]: Looking up breakpad interfaces from steamclient
13:45:30 user-pc steam[4881]: Calling BreakpadMiniDumpSystemInit
13:45:30 user-pc steam[4881]: SteamInternal_SetMinidumpSteamID:  Caching Steam ID:  76561197960265728 [API loaded yes]
13:45:30 user-pc steam[4881]: SteamInternal_SetMinidumpSteamID:  Setting Steam ID:  76561197960265728
13:45:31 user-pc steam[4881]: crash_20250226134530_16.dmp[66166]: Uploading dump (out-of-process)
13:45:31 user-pc steam[4881]: /tmp/dumps/crash_20250226134530_16.dmp
13:45:31 user-pc steam[4881]: Adding process 66165 for gameID 730
13:45:48 user-pc steam[4881]: /home/user/.local/share/Steam/steamapps/common/Counter-Strike Global Offensive/game/cs2.sh: line 98: 62629 Segmentation fault      (core dumped) ${STEAM_RUNTIME_PREFIX} ${GAME_DEBUGGER} "${GAMEROOT}"/${GAMEEXE} "$@"
13:45:52 user-pc steam[4881]: (process:62697): GLib-GObject-CRITICAL **: 13:45:52.243: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
13:46:02 user-pc steam[4881]: reaping pid: 62697 -- gameoverlayui
13:46:45 user-pc steam[4881]: crash_20250226134530_16.dmp[66166]: Finished uploading minidump (out-of-process): success = no
13:46:45 user-pc steam[4881]: crash_20250226134530_16.dmp[66166]: error: HTTP response code said error
13:46:45 user-pc steam[4881]: crash_20250226134530_16.dmp[66166]: file ''/tmp/dumps/crash_20250226134530_16.dmp'', upload no: ''HTTP response code said error''
13:46:45 user-pc steam[4881]: pid 66166 != 66165, skipping destruction (fork without exec?)
13:46:46 user-pc steam[4881]: Game Recording - game stopped [gameid=730]
13:46:46 user-pc steam[4881]: Removing process 66165 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62631 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62630 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62629 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62628 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62626 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62624 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62514 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62513 for gameID 730
13:46:46 user-pc steam[4881]: Removing process 62512 for gameID 730

So there is some room for improvement, but compared to what Iā€™ve been dealing with, this is entirely bearable.

I have also tried the default config with just LLC on Intel Default, as well as just LLC and LCC (without the 100MHz lock), and setting LCC to Mode 4, but all of those seemed less stable, with either the same issues as the starting point or the game closing more often.

The PSU is Corsair HX1000, which I believe should be fine.

I do, though the FPS is stable for me. I forgot to list it in the original post, but I did try disabling the overlay when looking for a solution.

I have two 16GB Kingston DDR5 memory sticks, never had XMP/EXPO enabled. I havenā€™t tried using only one of the sticks, which is a good suggestion, though now I think the evidence points to the CPU as the culprit.

I do have those enabled and disabling unfortunately didnā€™t work in my case, but thanks for the suggestion.

I donā€™t play too wide a range of titles, but I do think more CPU intensive tasks tend to make the issue worse/happen quicker. My main point of comparison are CS2, which is fairly performance demanding and smaller games like Vintage Story or Tabletop Simulator. At first the issue would only be present when playing the former, but with time it started happening even in less demanding games, as well as during other tasks, such as doing backups or updates.

I have not, but will definitely look more into this tool, thank you!
Do you have any suggestions in regards to how much of a reduction I should go for?
As Iā€™ve said, Iā€™m not too good with the hardware side of things :sweat_smile:

This is exactly right, finding the source of the problem is the main thing here. Now that switching up the CPU settings in the BIOS helped, I think thereā€™s a fairly large degree of confidence that this is it.
So if things start going downhill again, at least I know which part needs replacing.

1 Like

Fits (unless it was broken).


Your detailed error description, your troubleshooting attempts, your logs, and the overall error picture point to this screwed up 13th/14th gen intel CPU - and on top of that itā€™s a used CPU. Hardware ages, even CPUs, and in this case undervolting or limiting the clock frequency or using useless tools is exactly the wrong thing to do.
Disable everything that is OC, UV or whatever - in the OS and BIOS - and test if it works. If not, minimally increase the vcore (in BIOS) and test extensively, repeat the whole thing until it runs stable. As soon as it becomes unstable, lower the vcore again.

If you canā€™t get a stable CPU this way, then itā€™s about to die.

2 Likes

You wrote;
ā€œI have not, but will definitely look more into this tool, thank you!
Do you have any suggestions in regards to how much of a reduction I should go for?
As Iā€™ve said, Iā€™m not too good with the hardware side of things :sweat_smile:ā€

i wrote a little script to make this happen automatically according to the max temp you allow.

The handy part of this script is you will always have full cpu usage when your temperature is fine, and it will only throttle down when it gets above your select temperature.

#to run it run cpu-power-gui select performance profile and performance as default cpu power. Save profile calling it performance, then run this script from your home directory. You can select a higher, or lower in the CPUTempHigh. Default=84.

#copy, paste the lines below and save  it as cpupower.sh, giving it execution permissions.

#!/usr/bin/bash
thermalzone=2 #this value may vary on your system, usually the 1-4 is assigned to the cpu sensors.

if  [[ "$(pidof -x $(basename $0))" != $$ ]];then exit;fi;
DirPath=$HOME
CPUTempLow=70
CPUTempHigh=84
sleeper777=2.8
sleeper77=.777
running=0
inputdevices() { add=0;while : ; do KeyboardId=`xinput --list --long | grep XIKeyClass | head -n 1 | grep -E -o '[0-9]+'`;MouseId=`xinput --list --long | grep XIButtonClass | head -n 1 | grep -E -o '[0-9]+'`;add=$((add+1));echo 'KeyboardId='$KeyboardId;echo 'MouseId='$MouseId;echo 'add='$add;if [ ${#KeyboardId} -gt 0 ] && [ ${#MouseId} -gt 0 ] || [ $add -gt 21 ];then add=0;break;else notify-send -t 1405 'PRESS ANY KEY '$KeyboardId;sleep 1.4;fi;done;echo 'MOUSE='$MouseId" KeyboardId="$KeyboardId;
};DirPath=$HOME"/Scripts/.ourfiles/";inputdevices

#first run only! Make sure you run cpupower-gui and select the setting - inbuilt performance - applying it, and saving it as a NEW profile called - performance
if ! [ -s $HOME"/cpg-performance.profile" ];then if [ -s $HOME"/.config/cpupower_gui/cpg-performance.profile" ];then 
cp $HOME"/.config/cpupower_gui/cpg-performance.profile" $HOME"/cpg-performance.profile";else notify-send -t 7000 'Please select the setting inbuilt performance and apply afterwhich you save this setting to a profile calling it performance';sudo pacman -S cpupower-gui;/usr/bin/cpupower-gui;while : ; do if [ -s $HOME"/.config/cpupower_gui/cpg-performance.profile" ];then cp $HOME"/.config/cpupower_gui/cpg-performance.profile" $HOME"/cpg-performance.profile";break;else echo 'Please select the setting inbuilt performance and apply afterwhich you save this setting to a profile called - performance';/usr/bin/cpupower-gui;fi;sleep 3;done;fi;fi



# Checking for the performance profile file in your home directory, and copy an UNTAINTED profile file to your cpupower-gui directory, to collect the starting values from your cpu.
if [ -s $HOME"/cpg-performance.profile" ];then cp $HOME"/cpg-performance.profile" $HOME"/.config/cpupower_gui/cpg-performance.profile";readarray -t profile < $HOME"/cpg-performance.profile";cpupower-gui pr performance;clear;profile2="${profile[@]}";profile2count=${#profile2};x=1;for ii in ${!profile[@]};do LINE=${profile[$ii]};LINE=${profile[$ii]};echo $LINE;x=$((x+1));done;x1=$(($((x/2))-1));cpuP=$((x1/2));cpuE=$((x1/2));number=54;re='^[0-9]+$';while :; do highnumber=${profile2:$number:4};if [[ $highnumber =~ $re ]];then if [ $highnumber -gt 1000 ];then break;fi;fi;number=$((number+1));done;numberhigh=$highnumber;highnumber2=$highnumber;numberhighbak=$highnumber;lownumber=${profile2:$((profile2count-21)):4};numberlow=$lownumber;lownumber2=$lownumber;numberlowbak=$lownumber;CPU2=$(cpupower-gui freq);CPU2No=${#CPU2};CPUhigh=${CPU2:28:4};CPUlow=${CPU2:$((CPU2no-32)):4};if [ $highnumber = $lownumber2 ];then echo 'This CPU has no economy threads, changes will apply to all!';echo 'There are '$((cpuP+cpuE))';performance threads running at '$highnumber'Mhz';economy=0;else echo 'There are '$cpuP' performance threads running at '$highnumber'Mhz';echo 'There are '$cpuE' economy threads running at '$lownumber'Mhz';x2=-2;for ii2 in ${!profile[@]};do LINE2=${profile[$ii2]};x2=$((x2+1));while IFS=';' read -ra ADDR;do for i1 in "${ADDR[@]}";do if [[ ${i1:6:4} =~ $re ]];then i2=${i1:6:4};else i2=${i1:7:4};fi;if [[ $i2 =~ $re ]] && [ $i2 -gt $numberhigh ];then checknumber=$i2;for ii in ${!profile[@]};do LINE=${profile[$ii]};while IFS=';' read -ra ADDR;do for i in "${ADDR[@]}";do profilenew[${#profilenew[*]}]="${i/"$checknumber"/"$highnumber"}";done;done <<< "$LINE";done;printf "%s\n" "${profilenew[@]}" > "/home/gerry/.config/cpupower_gui/cpg-performance.profile";unset profile;unset profilenew;readarray -t profile < /home/gerry/.config/cpupower_gui/cpg-performance.profile;break;fi;if [[ $i2 =~ $re ]] && [ $i2 -lt $lownumber ];then checknumber=$i2;for ii in ${!profile[@]};do LINE=${profile[$ii]};while IFS=';' read -ra ADDR;do for i in "${ADDR[@]}";do profilenew[${#profilenew[*]}]="${i/"$checknumber"/"$numberlow"}";done;done <<< "$LINE";done;printf "%s\n" "${profilenew[@]}" > "/home/gerry/.config/cpupower_gui/cpg-performance.profile";unset profile;unset profilenew;readarray -t profile < /home/gerry/.config/cpupower_gui/cpg-performance.profile;break;fi;done;done <<< "$LINE2";done;economy=1;fi

#workingloop
while :; do 
temp=$((`cat /sys/class/thermal/thermal_zone$thermalzone/temp`/1000));
if [ $temp -gt $((CPUTempHigh-7)) ] || [ $running = 1 ];then running=1;
if [ $temp -lt $((CPUTempLow)) ] || [ $temp -gt $CPUTempHigh ];then if [ $temp -gt $CPUTempHigh ];then CPUusage=`top -bn1 | sed -n '/Cpu/p' | awk '{print $2}' | sed 's/..,//'`;if ! [ ${CPUusage:1:1} = '.' ] && [ ${CPUusage:0:2} -gt 98 ] && [ $temp -gt $((CPUTempHigh-7)) ];then if [ $highnumber2 -gt $((CPUTempHigh-2)) ];then lownumber2=$((lownumber2-210));highnumber2=$((highnumber2-420));else highnumber2=$((highnumber2-100));fi;else lownumber2=$((lownumber2-140));highnumber2=$((highnumber2-280));fi;elif [ $temp -lt $((CPUTempLow-21)) ] || [ $temp -lt 49 ];then lownumber2=$((numberlowbak));highnumber2=$((numberhighbak));running=0;elif [ $temp -gt $((CPUTempHigh-2)) ];then lownumber2=$((lownumber2-105));highnumber2=$((highnumber2-210));elif [ $temp -lt $((CPUTempLow-7)) ] && ! [ $((highnumber2+140)) -gt $numberhighbak ];then lownumber2=$((lownumber2+70));highnumber2=$((highnumber2+140));elif [ $temp -lt $((CPUTempLow)) ] && ! [ $((highnumber2+98)) -gt $numberhighbak ];then lownumber2=$((lownumber2+46));highnumber2=$((highnumber2+98));elif [ $temp -lt $((CPUTempLow)) ];then lownumber2=$((numberlowbak));highnumber2=$((numberhighbak));fi;fi

if [ $highnumber2 -lt $numberhigh ] || [ $highnumber -lt $((numberhigh)) ];then 
x=-2;
for ii in ${!profile[@]};do LINE=${profile[$ii]};echo $LINE;while IFS=';' read -ra ADDR;do for i in "${ADDR[@]}";do if [ $economy = 1 ];then if [ $x -lt 16 ];then profilenew[${#profilenew[*]}]="${i/"$highnumber"/"$highnumber2"}";else profilenew[${#profilenew[*]}]="${i/"$lownumber"/"$lownumber2"}";fi;else profilenew[${#profilenew[*]}]="${i/"$highnumber"/"$highnumber2"}";fi;x=$((x+1));done;done <<< "$LINE";done;printf "%s\n" "${profilenew[@]}" > "/home/gerry/.config/cpupower_gui/cpg-performance.profile";unset profile;unset profilenew;readarray -t profile < /home/gerry/.config/cpupower_gui/cpg-performance.profile;cpupower-gui pr performance;clear;echo 'CPU Performance='$highnumber2'Mhz';echo 'CPU Economy='$lownumber2'Mhz';echo 'CPU Temperature='$temp;echo 'running='$running;sleep $sleeper777;CPU2=$(cpupower-gui freq);CPU2No=${#CPU2};CPUhigh=${CPU2:28:4};CPUlow=${CPU2:$((CPU2no-32)):4};lownumber=$CPUlow;highnumber=$CPUhigh;if [ $highnumber2 = $numberhighbak ] && [ $lownumber2 = $numberlowbak ];then running=0;fi
fi;fi
clear
echo 'CPU Performance='$highnumber'Mhz';echo 'CPU Economy='$lownumber'Mhz';echo 'CPU Temperature='$temp;echo 'running='$running
sleep $sleeper777;
done;
fi

auto starting this script after you installed it and run cpupower-gui saving a profile called performance.

1 Like

Since the initial build, my PC never really worked reliably with Linux, no matter the distro, except once with a specific update of Garuda, now long gone.
So I relate a lot with this post, and Iā€™d like to see if we have this common point:

1- Keep track of the exact time of the systemā€™s hangup
2- And after reboot, check the kernel or dmesg logs at that time.
3- Any recurring message? In my case I had things lile ā€œGPU dropping out of the Busā€ (approx.)

Thatā€™s why I suspect (for my case) either a slight defect in the graphic card that is overseen in most cases but not always, with a tricky configuration to make it work. The successful configuration I had only once with even the suspend and wake up working (usually it just keep the screen dark or with weird display).

So, what are your messages at the time of the freeze? (check as I explained)

ā€œGPU has fallen off the busā€
This is a typical nvidia issue, known for more than 10 years. Depending on the XID (take a look in the nvidia doc`s) the cause is either the nvidia driver or hardware.

The most common causes are:
Defective GPU.
GPU not properly seated in the PCI(e) port.
Dirty PCI(e) contacts of the GPU.
Defective PCI(e) port on the mobo.
Dirty PCI(e) port on the mobo.

Unstable PCI(e) bus, caused by a weak, inferior or defective power supply and/or connected defective UPS. All it takes is for the PCI(e) port to receive 0,4-0,5 volts too little and you will get this error.
In this case, it is always advisable to measure the voltage at the PCI(e) port or to test the GPU in a second PCI(e) port, if available.

1 Like

A little update,

The issue somewhat persists, to the same extent as described in my previous post. Far less crashes and freezes, to the point where itā€™s a slight inconvenience instead of rendering the PC close to unusable. At least for now, since I expect this to me a temporary fix.

Interestingly, I no longer experience any issues is CS2, but in Vintage Story, which one would think is far less performance demanding, it does still happen every now and then.

I have OC and UV off. I have tried increasing vcore slightly as well, but wasnā€™t able to get it working. Perhaps a little more tinkering would be required.

As for the messages, itā€™s essentially only what I shared in the logs.
For example, this is the only thing out of the ordinary Iā€™ve found in dmesg after a system freeze requiring a reboot.

Log
[    0.494728] ------------[ cut here ]------------
[    0.494730] TLB flush not stride 200000 aligned. Start 7fffc0000000, end 7fffffe01000
[    0.494734] WARNING: CPU: 16 PID: 221 at arch/x86/mm/tlb.c:1370 flush_tlb_mm_range+0x5ca/0x650
[    0.494739] Modules linked in:
[    0.494740] CPU: 16 UID: 0 PID: 221 Comm: modprobe Not tainted 6.13.4-zen1-1-zen #1 17580750db1aef2dff7f31cba47c3ee24f2c04f2
[    0.494743] Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A WIFI (MS-7D25), BIOS A.K0 10/21/2024
[    0.494744] RIP: 0010:flush_tlb_mm_range+0x5ca/0x650
[    0.494746] Code: 89 85 f8 03 00 00 b8 01 00 00 00 e9 dd fc ff ff 48 8b 0c 24 4c 89 e2 48 c7 c7 60 ea a6 97 c6 05 3e 2a 31 02 01 e8 36 e8 01 00 <0f> 0b e9 e7 fa ff ff fa 0f 1f 44 00 00 48 89 df e8 51 f4 ff ff fb
[    0.494747] RSP: 0000:ffffac7800f57a30 EFLAGS: 00010282
[    0.494748] RAX: 0000000000000000 RBX: ffff90465fa35d80 RCX: 00000000ffffdfff
[    0.494749] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 00000000ffffdfff
[    0.494750] RBP: ffff903f0bc31080 R08: ffff90467ff54328 R09: 00000000ffffdfff
[    0.494751] R10: 0000000000000003 R11: 0000000000000002 R12: 00007fffc0000000
[    0.494752] R13: 0000000000000010 R14: 0000000000000015 R15: ffff90465fa00000
[    0.494752] FS:  0000000000000000(0000) GS:ffff90465fa00000(0000) knlGS:0000000000000000
[    0.494753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.494754] CR2: 0000000000000000 CR3: 0000000101854003 CR4: 0000000000f70ef0
[    0.494755] PKRU: 55555554
[    0.494756] Call Trace:
[    0.494758]  <TASK>
[    0.494758]  ? flush_tlb_mm_range+0x5ca/0x650
[    0.494760]  ? __warn.cold+0x93/0xed
[    0.494762]  ? flush_tlb_mm_range+0x5ca/0x650
[    0.494763]  ? report_bug+0xe7/0x210
[    0.494766]  ? handle_bug+0x58/0x90
[    0.494768]  ? exc_invalid_op+0x19/0xc0
[    0.494769]  ? asm_exc_invalid_op+0x1a/0x20
[    0.494771]  ? flush_tlb_mm_range+0x5ca/0x650
[    0.494773]  tlb_flush_mmu+0x125/0x1a0
[    0.494775]  tlb_finish_mmu+0x41/0x80
[    0.494776]  relocate_vma_down+0x183/0x200
[    0.494779]  setup_arg_pages+0x201/0x390
[    0.494781]  load_elf_binary+0x3a5/0x17b0
[    0.494784]  ? __kernel_read+0x19a/0x300
[    0.494785]  ? load_misc_binary+0x265/0x380
[    0.494787]  bprm_execve+0x241/0x630
[    0.494788]  kernel_execve+0x17d/0x1e0
[    0.494789]  call_usermodehelper_exec_async+0xd0/0x190
[    0.494792]  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[    0.494793]  ret_from_fork+0x31/0x50
[    0.494794]  ? __pfx_call_usermodehelper_exec_async+0x10/0x10
[    0.494796]  ret_from_fork_asm+0x1a/0x30
[    0.494798]  </TASK>
[    0.494799] ---[ end trace 0000000000000000 ]---

You can measure the board voltages and some more but

sensor is not ā€œactice.ā€ You should configure this.
It`s easy to do ā†’ sudo sensors-detect

please read this for info
https://www.cyberciti.biz/faq/howto-linux-get-sensors-information/

after this step run sensors inside konsole for infos.

Btw, install hardinfo2 to see whatā€™s going on ā†’ sudo pacman -S hardinfo2

But i mean you have a hardware issue. If you can, test another gpu - memory - power supply (+ ac-caple) - cpu ā€¦, of course and take off all ext. devices.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.