Nvidia GPU Hangs

So I'm getting a bunch of hangs while gaming and I see this in my journal

Dec 16 18:59:20 Kiseki kernel: NVRM: GPU at PCI:0000:0d:00: GPU-44edb224-1619-1d95-18b6-8f9f5935a28a
Dec 16 18:59:20 Kiseki kernel: NVRM: Xid (PCI:0000:0d:00): 8, pid=120229, Channel 0000007b

which according to XID Errors :: GPU Deployment and Management Documentation means something went wrong with driver.

Things I have tried

Switching nvidia driver branches as this started after a driver update.

disabling any extras like mangohud.
disabling mangohud does seem to help as I haven't been able to get the issue without mangohud though idk how mangohud could cause this (and flightless mango agrees that's it's impossible for mangohud to be causing gpu hangs plus it was working before the driver update.

Trying different kernel versions like zen and lts
which didn't help at all

inxi
inxi -Faz
System:
  Kernel: 5.15.6-225-tkg-pds x86_64 bits: 64 compiler: gcc v: 11.1.0
    parameters: intel_pstate=passive
    BOOT_IMAGE=/@/boot/vmlinuz-linux-tkg-pds-generic_v3
    root=UUID=ef15347e-a4da-4628-afc3-2bff20cbb710 rw [email protected]
    rd.udev.log_priority=3 vt.global_cursor_default=0
    systemd.unified_cgroup_hierarchy=1
    resume=UUID=e7745511-30a0-4b3d-93c1-4bc3daa8b2b8 loglevel=3
    sysrq_always_enabled=1 nowatchdog mitigations=off
  Desktop: KDE Plasma 5.23.4 tk: Qt 5.15.2 info: latte-dock wm: kwin_x11
    vt: 1 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop Mobo: ASRock model: X470 Taichi serial: <superuser required>
    UEFI: American Megatrends v: P4.20 date: 07/22/2020
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 3
Use of uninitialized value $val2 in string eq at /usr/bin/inxi line 6943.
Use of uninitialized value $val2 in split at /usr/bin/inxi line 6948.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6950.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6953.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6954.
    serial: <filter> charge: 55% (should be ignored) rechargeable: yes status:
  Device-2: ps-controller-battery-a0:ab:51:90:7e:70 model: N/A serial: N/A
Use of uninitialized value $val2 in string eq at /usr/bin/inxi line 6943.
Use of uninitialized value $val2 in split at /usr/bin/inxi line 6948.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6950.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6953.
Use of uninitialized value $val2 in concatenation (.) or string at /usr/bin/inxi line 6954.
    charge: N/A status:
CPU:
  Info: model: AMD Ryzen 5 2600X bits: 64 type: MT MCP arch: Zen+
    family: 0x17 (23) model-id: 8 stepping: 2 microcode: 0x800820D
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 576 KiB desc: d-6x32 KiB; i-6x64 KiB L2: 3 MiB desc: 6x512 KiB
    L3: 16 MiB desc: 2x8 MiB
  Speed (MHz): avg: 4121 high: 4162 min/max: 2200/3600 boost: enabled
    scaling: driver: acpi-cpufreq governor: performance cores: 1: 4013 2: 4083
    3: 4144 4: 4142 5: 4122 6: 4128 7: 4105 8: 4162 9: 4114 10: 4152 11: 4148
    12: 4148 bogomips: 86493
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass status: Vulnerable
  Type: spectre_v1 status: Vulnerable: __user pointer sanitization and
    usercopy barriers only; no swapgs barriers
  Type: spectre_v2 status: Vulnerable, IBPB: disabled, STIBP: disabled
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GP104 [GeForce GTX 1080] vendor: Gigabyte driver: nvidia
    v: 470.94 alternate: nouveau,nvidia_drm bus-ID: 0d:00.0 chip-ID: 10de:1b80
    class-ID: 0300
  Display: x11 server: X.Org 1.21.1.2 compositor: kwin_x11 driver:
    loaded: nvidia display-ID: :0 screens: 1
  Screen-1: 0 s-res: 6400x2160 s-dpi: 101 s-size: 1608x543mm (63.3x21.4")
    s-diag: 1697mm (66.8")
  Monitor-1: HDMI-0 res: 3840x2160 hz: 60 dpi: 52
    size: 1872x1053mm (73.7x41.5") diag: 2148mm (84.6")
  Monitor-2: DP-0 res: 2560x1080 dpi: 81 size: 798x334mm (31.4x13.1")
    diag: 865mm (34.1")
  OpenGL: renderer: NVIDIA GeForce GTX 1080/PCIe/SSE2
    v: 4.6.0 NVIDIA 470.94 direct render: Yes
Audio:
  Device-1: NVIDIA GP104 High Definition Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel bus-ID: 0d:00.1 chip-ID: 10de:10f0
    class-ID: 0403
  Device-2: Sony Wireless Controller type: USB
    driver: playstation,snd-usb-audio,usbhid bus-ID: 1-1.3:8 chip-ID: 054c:0ce6
    class-ID: 0300
  Sound Server-1: ALSA v: k5.15.6-225-tkg-pds running: yes
  Sound Server-2: sndio v: N/A running: no
  Sound Server-3: JACK v: 1.9.19 running: no
  Sound Server-4: PulseAudio v: 15.0 running: no
  Sound Server-5: PipeWire v: 0.3.42 running: yes
Network:
  Device-1: Intel I211 Gigabit Network vendor: ASRock driver: igb v: kernel
    port: d000 bus-ID: 09:00.0 chip-ID: 8086:1539 class-ID: 0200
  IF: enp9s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-2: Sony Wireless Controller type: USB
    driver: playstation,snd-usb-audio,usbhid bus-ID: 1-1.3:8 chip-ID: 054c:0ce6
    class-ID: 0300
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel Wireless-AC 3168 Bluetooth type: USB driver: btusb v: 0.8
    bus-ID: 1-9:3 chip-ID: 8087:0aa7 class-ID: e001
  Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: N/A
Drives:
  Local Storage: total: 2.96 TiB used: 3.33 TiB (112.5%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:5 vendor: Samsung model: SSD 980 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B
    speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: 1B4QFXO7
    temp: 44.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Samsung
    model: SSD 960 EVO 250GB size: 232.89 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: 3B7QCXE7 temp: 30.9 C scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: Seagate model: ST2000DX002-2DV164
    size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: HDD rpm: 7200 serial: <filter> rev: CC41 scheme: GPT
Partition:
  ID-1: / raw-size: 914.08 GiB size: 914.08 GiB (100.00%)
    used: 829.46 GiB (90.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:7
  ID-2: /boot/efi raw-size: 260 MiB size: 256 MiB (98.46%)
    used: 563 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:6
  ID-3: /home raw-size: 914.08 GiB size: 914.08 GiB (100.00%)
    used: 829.46 GiB (90.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:7
  ID-4: /var/log raw-size: 914.08 GiB size: 914.08 GiB (100.00%)
    used: 829.46 GiB (90.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:7
  ID-5: /var/tmp raw-size: 914.08 GiB size: 914.08 GiB (100.00%)
    used: 829.46 GiB (90.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:7
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 50 (default 100)
  ID-1: swap-1 type: partition size: 17.18 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/nvme0n1p3 maj-min: 259:8
  ID-2: swap-2 type: zram size: 15.61 GiB used: 4.07 GiB (26.1%)
    priority: 100 dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 54.0 C mobo: 36.0 C gpu: nvidia temp: 48 C
  Fan Speeds (RPM): fan-1: 0 fan-2: 1301 fan-3: 1295 fan-4: 1475
    fan-5: 1215 gpu: nvidia fan: 0%
  Power: 12v: N/A 5v: N/A 3.3v: 3.31 vbat: 3.28
Info:
  Processes: 471 Uptime: 6h 24m wakeups: 17 Memory: 15.61 GiB
  used: 8.83 GiB (56.6%) Init: systemd v: 249 tool: systemctl Compilers:
  gcc: 11.1.0 clang: 13.0.0 Packages: 2179 pacman: 2174 lib: 565 flatpak: 5
  Shell: fish v: 3.3.1 default: Bash v: 5.1.12 running-in: alacritty
  inxi: 3.3.10

Tried downgrading to 495.44 which was working before(last weekend) and i still get the same hangs any ideas?

so after a few updates (saw there was one for xorg in the last day) My issue worsen from the GPU stopping rendering of a game (but desktop is fine) to all of x just crashing tried going back to a snapshot before the updates and I am still getting crashes in x
journal filtered to show errors

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.