So, I turned on my computer and went to put baby to sleep. When I came back, screens were off - I thought they just went to sleep as usual, and tried moving my mouse. Nothing happened. Tried using keyboard, even ctrl+alt+f1~4 to change terminals, and nothing. So I hard reset the computer.
Got back in, used journalctl -o short-precise -k -b -1
(thank you Stack Exchange) and there’s this huge amount of nvidia-modeset
error messages:
out 21 21:42:24.001362 johnny-g55590 kernel: NVRM: GPU at PCI:0000:01:00: GPU-699bec45-43f5-ce35-be5f-6f624a2850a7
out 21 21:42:24.001434 johnny-g55590 kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
out 21 21:42:24.001454 johnny-g55590 kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
out 21 21:42:24.001467 johnny-g55590 kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
out 21 21:42:29.123205 johnny-g55590 kernel: NVRM: Error in service of callback
out 21 21:42:37.680253 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57d:0:0:0x00>
out 21 21:42:37.680338 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:0:0:0x00>
out 21 21:42:37.680357 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:1:0:0x00>
out 21 21:42:37.680370 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:2:0:0x00>
out 21 21:42:37.680381 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:3:0:0x00>
out 21 21:42:37.680393 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:4:0:0x00>
out 21 21:42:37.680404 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:5:0:0x00>
out 21 21:42:37.680419 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:6:0:0x00>
out 21 21:42:37.680451 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c57e:7:0:0x00>
out 21 22:13:33.941283 johnny-g55590 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c57d:0 2:0:4048:4040
the last line repeats a LOT of times.
I see the message in there saying “please run nvidia-bug-report”, but nothing allows me to interact with the computer after this as no screen shows up.
Wondering if I should try to remove the things I changed to disable the intel graphics card here to try to investigate this - since looking around the internet, it seems that “GPU has fallen off the bus.” is said to be a pretty vague error that could be caused by numerous things, like power supply or physical graphics card connection to motherboard…
Which, in both accounts, I think isn’t the case since this is a laptop (Dell G5), properly maintained (I think I open and clean it once a month - ok, maybe it’s over-zealously maintained).
I don’t remember ever having this kind of problem before disabling the intel graphics and following the steps from the Arch Wiki to use only my nVidia card…
(thinking about other topics I’ve been opening… am I cursed? =p )
System:
Kernel: 6.5.8-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc available: acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=468e3250-834f-4678-85b1-f50f268e557d rw rootflags=subvol=@
quiet console=tty0 console=ttyS0,115200n8 cryptomgr.notests
initcall_debug intel_iommu=igfx_off kvm-intel.nested=1 no_timer_check
noreplace-smp page_alloc.shuffle=1 rcupdate.rcu_expedited=1
rootfstype=ext4,btrfs,xfs,f2fs tsc=reliable rd.udev.log_priority=3
vt.global_cursor_default=0
resume=UUID=92d5bc58-440e-4eab-9f01-4fa35d34e02b loglevel=3 rw ibt=off
Desktop: KDE Plasma v: 5.27.8 tk: Qt v: 5.15.11 wm: kwin_x11 vt: 2
dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Laptop System: Dell product: G5 5590 v: N/A
serial: <superuser required> Chassis: type: 10 serial: <superuser required>
Mobo: Dell model: 0F3T2G v: A00 serial: <superuser required> UEFI: Dell
v: 1.22.0 date: 11/10/2022
Battery:
ID-1: BAT0 charge: 45.4 Wh (100.0%) condition: 45.4/60.0 Wh (75.6%)
volts: 16.9 min: 15.2 model: SMP DELL JJPFK87 type: Li-poly serial: <filter>
status: full
CPU:
Info: model: Intel Core i7-9750H bits: 64 type: MT MCP arch: Coffee Lake
gen: core 9 level: v3 note: check built: 2018 process: Intel 14nm family: 6
model-id: 0x9E (158) stepping: 0xA (10) microcode: 0xF4
Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 1.5 MiB desc: 6x256 KiB
L3: 12 MiB desc: 1x12 MiB
Speed (MHz): avg: 2217 high: 4203 min/max: 800/4500 scaling:
driver: intel_pstate governor: powersave cores: 1: 4203 2: 4203 3: 800
4: 800 5: 800 6: 800 7: 4201 8: 800 9: 800 10: 4203 11: 800 12: 4203
bogomips: 62399
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities: <filter>
Graphics:
Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] vendor: Dell
driver: i915 v: kernel arch: Gen-9.5 process: Intel 14nm built: 2016-20
ports: active: none off: eDP-1 empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2
bus-ID: 00:02.0 chip-ID: 8086:3e9b class-ID: 0300
Device-2: NVIDIA TU106M [GeForce RTX 2060 Mobile] vendor: Dell
driver: nvidia v: 535.113.01 alternate: nouveau,nvidia_drm non-free: 535.xx+
status: current (as of 2023-09) arch: Turing code: TUxxx
process: TSMC 12nm FF built: 2018-22 pcie: gen: 1 speed: 2.5 GT/s lanes: 8
link-max: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 01:00.0
chip-ID: 10de:1f11 class-ID: 0300
Device-3: Microdia Integrated_Webcam_HD driver: uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-5:3 chip-ID: 0c45:671f
class-ID: 0e02
Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.2.1
compositor: kwin_x11 driver: X: loaded: modesetting,nvidia unloaded: nouveau
alternate: fbdev,intel,nv,vesa dri: iris gpu: i915 display-ID: :0
screens: 1
Screen-1: 0 s-res: 2560x2160 s-dpi: 114 s-size: 571x482mm (22.48x18.98")
s-diag: 747mm (29.42")
Monitor-1: DP-0 pos: primary,top res: 2560x1080 hz: 60 dpi: 81
size: 798x334mm (31.42x13.15") diag: 865mm (34.06") modes: N/A
Monitor-2: HDMI-0 pos: bottom res: 2560x1080 hz: 60 dpi: 96
size: 677x290mm (26.65x11.42") diag: 736mm (29") modes: N/A
Monitor-3: eDP-1-1 size-res: N/A modes: N/A
API: EGL v: 1.5 hw: drv: nvidia platforms: gbm: drv: nvidia
API: OpenGL v: 4.6.0 vendor: nvidia v: 535.113.01 glx-v: 1.4
direct-render: yes renderer: NVIDIA GeForce RTX 2060/PCIe/SSE2
memory: 5.86 GiB
API: Vulkan v: 1.3.264 layers: 14 device: 0 type: integrated-gpu
name: Intel UHD Graphics 630 (CFL GT2) driver: mesa intel v: 23.2.1-arch1.2
device-ID: 8086:3e9b surfaces: xcb,xlib device: 1 type: discrete-gpu
name: NVIDIA GeForce RTX 2060 driver: nvidia v: 535.113.01
device-ID: 10de:1f11 surfaces: xcb,xlib device: 2 type: cpu name: llvmpipe
(LLVM 16.0.6 256 bits) driver: mesa llvmpipe v: 23.2.1-arch1.2 (LLVM
16.0.6) device-ID: 10005:0000 surfaces: xcb,xlib
Audio:
Device-1: Intel Cannon Lake PCH cAVS vendor: Dell driver: snd_hda_intel
v: kernel alternate: snd_soc_skl,snd_sof_pci_intel_cnl bus-ID: 00:1f.3
chip-ID: 8086:a348 class-ID: 0403
Device-2: NVIDIA TU106 High Definition Audio vendor: Dell
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 8
link-max: lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:10f9 class-ID: 0403
Device-3: Generalplus USB Audio Device
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 1-4.1:4 chip-ID: 1b3f:2008 class-ID: 0300
Device-4: Realtek USB Audio driver: snd-usb-audio type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-4.5:8 chip-ID: 0bda:4014
class-ID: 0102 serial: <filter>
API: ALSA v: k6.5.8-zen1-1-zen status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 0.3.83 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek vendor: Dell driver: r8169 v: kernel pcie: gen: 1
speed: 2.5 GT/s lanes: 1 port: 3000 bus-ID: 3c:00.0 chip-ID: 10ec:2502
class-ID: 0200
IF: enp60s0 state: down mac: <filter>
Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
vendor: Dell driver: ath10k_pci v: kernel pcie: gen: 1 speed: 2.5 GT/s
lanes: 1 bus-ID: 3d:00.0 chip-ID: 168c:003e class-ID: 0280 temp: 48.0 C
IF: wlp61s0 state: down mac: <filter>
Device-3: Realtek RTL8153 Gigabit Ethernet Adapter driver: r8152 type: USB
rev: 3.0 speed: 5 Gb/s lanes: 1 mode: 3.2 gen-1x1 bus-ID: 6-1.2:3
chip-ID: 0bda:8153 class-ID: 0000 serial: <filter>
IF: enp58s0u1u2 state: up speed: 1000 Mbps duplex: full mac: <filter>
Bluetooth:
Device-1: Qualcomm Atheros driver: btusb v: 0.8 type: USB rev: 2.0
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:7 chip-ID: 0cf3:e007
class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 4.2
lmp-v: 8 status: discoverable: no pairing: no class-ID: 7c010c
Drives:
Local Storage: total: 1.14 TiB used: 489.5 GiB (41.8%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
model: PC SN520 NVMe WDC 256GB size: 238.47 GiB block-size: physical: 512 B
logical: 512 B speed: 15.8 Gb/s lanes: 2 tech: SSD serial: <filter>
fw-rev: 20240012 temp: 55.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Western Digital
model: WD10SPZX-75Z10T3 size: 931.51 GiB block-size: physical: 4096 B
logical: 512 B speed: 6.0 Gb/s tech: HDD rpm: 5400 serial: <filter>
fw-rev: 4514 scheme: GPT
Partition:
ID-1: / raw-size: 221.19 GiB size: 221.19 GiB (100.00%)
used: 105.23 GiB (47.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 632 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 221.19 GiB size: 221.19 GiB (100.00%)
used: 105.23 GiB (47.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-4: /var/log raw-size: 221.19 GiB size: 221.19 GiB (100.00%)
used: 105.23 GiB (47.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-5: /var/tmp raw-size: 221.19 GiB size: 221.19 GiB (100.00%)
used: 105.23 GiB (47.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 15.31 GiB used: 11.8 MiB (0.1%)
priority: 100 comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 12
dev: /dev/zram0
ID-2: swap-2 type: partition size: 16.98 GiB used: 0 KiB (0.0%)
priority: -2 dev: /dev/nvme0n1p3 maj-min: 259:3
Sensors:
System Temperatures: cpu: 57.0 C pch: 71.0 C mobo: N/A gpu: nvidia
temp: 52 C
Fan Speeds (rpm): N/A
Info:
Processes: 330 Uptime: 46m wakeups: 5 Memory: total: 16 GiB note: est.
available: 15.31 GiB used: 5.95 GiB (38.9%) Init: systemd v: 254
default: graphical tool: systemctl Compilers: gcc: 13.2.1 clang: 16.0.6
Packages: 1946 pm: pacman pkgs: 1937 libs: 488
tools: gnome-software,octopi,pamac,paru,yay pm: flatpak pkgs: 9 Shell: Zsh
v: 5.9 running-in: kitty inxi: 3.3.30
Garuda (2.6.17-1):
System install date: 2023-04-01
Last full system update: 2023-10-21
Is partially upgraded: No
Relevant software: snapper NetworkManager dracut nvidia-dkms
Windows dual boot: No/Undetected
Failed units: