Unable to load nvidia driver

Hello!

I've just installed Garuda on my laptop for the first time (looks amazing!) and while trying to see why my external HDMI monitor is not connecting, I've found that my nvidia drivers aren't loading.

I've had a look around the forums but other problems don't look the same as mine.

When executing any nvidia-related command like nvidia-settings they often work, however I get the following message:

ERROR: NVIDIA driver is not loaded

Attempting to load the module with modprobe hangs when inserting the module with insmod:

╰─λ sudo modprobe nvidia -vv
modprobe: INFO: custom logging function 0x556857bda7f0 registered
insmod /lib/modules/5.18.1-zen1-1-zen/updates/dkms/nvidia.ko.zst

I've left that running for about half an hour and it just seems to be stuck there.

Below is the output of garuda-inxi:

System:
Kernel: 5.18.1-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=b89e4136-eb03-45f5-a840-716a9c3c004d rw [email protected]
quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
loglevel=3
Desktop: KDE Plasma v: 5.24.5 tk: Qt v: 5.15.4 info: latte-dock
wm: kwin_x11 vt: 1 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Laptop System: Dell product: Dell G15 Special Edition 5521 v: N/A
serial: <superuser required> Chassis: type: 10 serial: <superuser required>
Mobo: Dell model: 0371KJ v: A01 serial: <superuser required> UEFI: Dell
v: 1.4.1 date: 02/25/2022
Battery:
ID-1: BAT0 charge: 78.6 Wh (99.0%) condition: 79.4/84.3 Wh (94.2%)
volts: 13.2 min: 11.4 model: BYD DELL M59JH22 type: Li-poly
serial: <filter> status: charging
CPU:
Info: model: 12th Gen Intel Core i7-12700H bits: 64 type: MST AMCP
arch: Alder Lake family: 6 model-id: 0x9A (154) stepping: 3
microcode: 0x41C
Topology: cpus: 1x cores: 14 mt: 6 tpc: 2 st: 8 threads: 20 smt: enabled
cache: L1: 1.2 MiB desc: d-8x32 KiB, 6x48 KiB; i-6x32 KiB, 8x64 KiB
L2: 11.5 MiB desc: 6x1.2 MiB, 2x2 MiB L3: 24 MiB desc: 1x24 MiB
Speed (MHz): avg: 693 high: 1120 min/max: 400/4679:4700:3500 scaling:
driver: intel_pstate governor: powersave cores: 1: 456 2: 469 3: 609 4: 619
5: 863 6: 659 7: 770 8: 816 9: 669 10: 836 11: 439 12: 458 13: 760
14: 1012 15: 1120 16: 862 17: 535 18: 453 19: 853 20: 606
bogomips: 107520
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl
Type: spectre_v1
mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2
mitigation: Enhanced IBRS, IBPB: conditional, RSB filling
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: Intel Alder Lake-P Integrated Graphics vendor: Dell driver: i915
v: kernel ports: active: eDP-1 empty: DP-1,DP-2 bus-ID: 0000:00:02.0
chip-ID: 8086:46a6 class-ID: 0300
Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] vendor: Dell
driver: N/A alternate: nouveau, nvidia_drm, nvidia non-free: 515.xx+
status: current (as of 2022-05) arch: Ampere bus-ID: 0000:01:00.0
chip-ID: 10de:2560 class-ID: 0300
Device-3: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo
bus-ID: 3-5:5 chip-ID: 0c45:6725 class-ID: 0e02
Display: x11 server: X.Org v: 21.1.3 with: Xwayland v: 22.1.2
compositor: kwin_x11 driver: X: loaded: modesetting
alternate: fbdev,intel,vesa gpu: i915 display-ID: :0 screens: 1
Screen-1: 0 s-res: 2560x1440 s-dpi: 96 s-size: 677x381mm (26.65x15.00")
s-diag: 777mm (30.58")
Monitor-1: eDP-1 model: LG Display 0x0690 built: 2020 res: 2560x1440
hz: 60 dpi: 189 gamma: 1.2 size: 344x194mm (13.54x7.64")
diag: 395mm (15.5") ratio: 16:9 modes: 2560x1440
OpenGL: renderer: Mesa Intel Graphics (ADL GT2) v: 4.6 Mesa 22.1.1
direct render: Yes
Audio:
Device-1: Intel Alder Lake PCH-P High Definition Audio vendor: Dell
driver: sof-audio-pci-intel-tgl
alternate: snd_hda_intel,snd_sof_pci_intel_tgl bus-ID: 0000:00:1f.3
chip-ID: 8086:51c8 class-ID: 0401
Device-2: NVIDIA driver: snd_hda_intel v: kernel bus-ID: 0000:01:00.1
chip-ID: 10de:228e class-ID: 0403
Device-3: ASUSTek ROG Theta Ultimate 7.1 gaming headset type: USB
driver: hid-generic,snd-usb-audio,usbhid bus-ID: 3-6:7 chip-ID: 0b05:18a9
class-ID: 0300 serial: <filter>
Sound Server-1: ALSA v: k5.18.1-zen1-1-zen running: yes
Sound Server-2: PulseAudio v: 16.0 running: no
Sound Server-3: PipeWire v: 0.3.51 running: yes
Network:
Device-1: Intel Alder Lake-P PCH CNVi WiFi vendor: Rivet Networks
driver: iwlwifi v: kernel bus-ID: 0000:00:14.3 chip-ID: 8086:51f0
class-ID: 0280
IF: wlp0s20f3 state: down mac: <filter>
Device-2: Realtek vendor: Dell driver: r8169 v: kernel port: 3000
bus-ID: 0000:3a:00.0 chip-ID: 10ec:2600 class-ID: 0200
IF: enp58s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Bluetooth:
Device-1: Intel AX201 Bluetooth type: USB driver: btusb v: 0.8
bus-ID: 3-10:9 chip-ID: 8087:0026 class-ID: e001
Report: bt-adapter ID: hci0 rfk-id: 1 state: up address: <filter>
RAID:
Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
v: 0.6 port: N/A bus-ID: 0000:00:0e.0 chip-ID: 8086:467f rev:
class-ID: 0104
Drives:
Local Storage: total: 953.87 GiB used: 27.28 GiB (2.9%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: SK Hynix model: BC711 NVMe 1TB
size: 953.87 GiB block-size: physical: 512 B logical: 512 B
speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: 41002131
temp: 39.9 C scheme: GPT
Partition:
ID-1: / raw-size: 600 GiB size: 600 GiB (100.00%) used: 27.18 GiB (4.5%)
fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
ID-2: /boot/efi raw-size: 300 MiB size: 296 MiB (98.67%)
used: 98.2 MiB (33.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 600 GiB size: 600 GiB (100.00%)
used: 27.18 GiB (4.5%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
ID-4: /var/log raw-size: 600 GiB size: 600 GiB (100.00%)
used: 27.18 GiB (4.5%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
ID-5: /var/tmp raw-size: 600 GiB size: 600 GiB (100.00%)
used: 27.18 GiB (4.5%) fs: btrfs dev: /dev/nvme0n1p7 maj-min: 259:7
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 31.04 GiB used: 0 KiB (0.0%) priority: 100
dev: /dev/zram0
Sensors:
System Temperatures: cpu: 50.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info:
Processes: 415 Uptime: 1h 25m wakeups: 3 Memory: 31.04 GiB
used: 7.1 GiB (22.9%) Init: systemd v: 251 tool: systemctl Compilers:
gcc: 12.1.0 Packages: pacman: 1218 lib: 297 Shell: fish v: 3.4.1
default: Bash v: 5.1.16 running-in: konsole inxi: 3.3.16
Garuda (2.6.3-2):
System install date:     2022-05-31
Last full system update: 2022-06-06
Is partially upgraded:   No
Relevant software:       NetworkManager
Windows dual boot:       Probably (Run as root to verify)
Snapshots:               Snapper
Failed units:

Thanks for the help!

Welcome to the Forums! :slight_smile:

When you installed from the Live ISO, did you boot the Live ISO using OPEN SOURCE drivers or the PROPRIETARY drivers?

Second, can you post the output of

cat /etc/mkinitcpio.conf

nvidia modules should be in there so they get loaded on startup. It's also possible you need to blacklist some other modules too but let's start with the above 2 questions.

1 Like

Kernel 5.18, Intel Gen 12 CPU, Nvidia GPU:

Use ibt=off kernel parameter.

1 Like

Thanks @FGD!

When you installed from the Live ISO, did you boot the Live ISO using OPEN SOURCE drivers or the PROPRIETARY drivers?

If I recall correctly, I did it with the proprietary drivers. At least I don't see any nouveau packages installed.

Second, can you post the output of

cat /etc/mkinitcpio.conf
[🔴] × cat /etc/mkinitcpio.conf --paging=never
File: /etc/mkinitcpio.conf
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(crc32c-intel intel_agp i915 amdgpu radeon nouveau)
MODULES=(crc32c-intel intel_agp i915 amdgpu radeon nouveau)

# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image.  This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
FILES=""

# HOOKS
# This is the most important setting in this file.  The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added.  Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
##   This setup specifies all modules in the MODULES setting above.
##   No raid, lvm2, or encrypted root is needed.
#    HOOKS=(base)
#
##   This setup will autodetect all modules for your system and should
##   work as a sane default
#    HOOKS=(base udev autodetect block filesystems)
#
##   This setup will generate a 'full' image which supports most systems.
##   No autodetection is done.
#    HOOKS=(base udev block filesystems)
#
##   This setup assembles a pata mdadm array with an encrypted root FS.
##   Note: See 'mkinitcpio -H mdadm' for more information on raid devices.
#    HOOKS=(base udev block mdadm encrypt filesystems)
#
##   This setup loads an lvm2 volume group on a usb device.
#    HOOKS=(base udev block lvm2 filesystems)
#
##   NOTE: If you have /usr on a separate partition, you MUST include the
#    usr, fsck and shutdown hooks.
HOOKS="base udev autodetect modconf block keyboard keymap consolefont plymouth filesystems grub-btrfs-overlayfs"

# COMPRESSION
# Use this to compress the initramfs image. By default, zstd compression
# is used. Use 'cat' to create an uncompressed image.
#COMPRESSION="zstd"
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"

# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=()

@mrvictory: Ooof... would I be better off using LTS kernel and LTS drivers?

Yes, this will solve as well.

1 Like

Well, I don't like the fact your initramfs is not built with nvidia drivers, as shown in your perfectly formated post (tnx for that!). But considering the 5.18 issue with nvidia, which I haven't followed at all, maybe it's part of the problem.

I will let someone who knows more about this explain what's best to do.

I'm going to try using LTS like mrvictory suggests and then I'll report back.

1 Like

Yes plz. That'll help others with same issue.

Make backups before switching Kernel.

1 Like

What issue with 5.18 and nVidia? I'm on 5.18 with 2 nVidia GPU's with no issue.

So only with Intel chips then? I know 5.18 broken some mdadm setups but I hadn't noticed anything with the drivers. I suspect something silly though and it's been suggested but a mkinitcpio -P to make the system rebuild the nvidia dkms. I don't recall anyone asking if he checked for the 5.18 headers. If they aren't installed the module won't build.

Highly recommend the FroggingFamily NVidia driver pack. Or get an AMD card, which is the perfect solution.

Kernel 5.18 has gained support for Intel Bridge Technology. IBT is supported on Gen11+ Intel CPUs. Nvidia drivers aren't compiled for IBT so when kernel attempts to use IBT Nvidia driver fails to load.

1 Like

Ah gotcha, I hadn't heard about this.

I totally agree but most people from Garuda Team will not recommend this (as seen in another post), as it's probably not what Garuda wants to use (otherwise they would, right) and requires building drivers once in a while, which is not recmmended for most newbies.

Still haven't sorted it out, so I'm listing what I've tried so far.

I've tried getting the LTS kernel working but Garuda doesn't like that I try to install the nvidia LTS drivers. So abandoned that idea.

I moved on to trying to use ibt=off option on boot but then got an error related to Samba which couldn't load.

I might try this later, but according to FGD it might not be an easy solution to maintain in the longterm..The FroggingFamily bit, not the AMD bit. Unfortunately I'm stuck with what this laptop's got.

I'm going to see if I can sort out that samba issue (and also get the proper error message).

EDIT: The message was a simple daemon not starting error:

FAILED Failed to start Samba NMB Daemon

Disabled nmb.service, the system still won't boot (I don't know why I expected otherwise... Hope's a funny thing)

I know this is a long shot have you turned of fast boot and secure boot .
Have you tested with the live usb does it load nvidia drivers when using it?

It sounds like your install is corrupted some how

1 Like

Both fast boot and secure boot are disabled.

I've not checked what the live usb is doing but I think it is using an earlier version of the kernel so it might not give me a lot of insight.

I've reinstalled my system last night too, wanted a 100% fresh start after some changes I made.

I checked the journalctl entry for a boot with ibt=off and while I'm not very used to reading boot output, but I think these are the lines that are relevant to the system not booting completely:

Jun 07 06:45:50 elune systemd[1]: Starting Samba SMB Daemon...
Jun 07 06:45:51 elune sddm[822]: Display server starting...
Jun 07 06:45:51 elune sddm[822]: Adding cookie to "/var/run/sddm/{f39b69c8-61ee-483d-b96a-e897e821922b}"
Jun 07 06:45:51 elune sddm[822]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt1 -auth /var/run/sddm/{f39b69c8-61ee-483d-b96a-e897e821922b} -noreset -displayfd 17
Jun 07 06:45:51 elune smbd[834]: [2022/06/07 06:45:51.101259, 0] ../../source3/smbd/server.c:1741(main)
Jun 07 06:45:51 elune smbd[834]: smbd version 4.16.1 started.
Jun 07 06:45:51 elune smbd[834]: Copyright Andrew Tridgell and the Samba Team 1992-2022
Jun 07 06:45:51 elune systemd[1]: Started Samba SMB Daemon.
Jun 07 06:45:51 elune audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=smb comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 07 06:45:51 elune systemd[1]: Reached target Multi-User System.
Jun 07 06:45:51 elune systemd[1]: Reached target Graphical Interface.
Jun 07 06:45:51 elune kernel: audit: type=1130 audit(1654580751.106:73): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=smb comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 07 06:45:51 elune systemd[1]: Startup finished in 7.041s (firmware) + 15.908s (loader) + 2.742s (kernel) + 8.812s (userspace) = 34.503s.
Jun 07 06:45:51 elune sddm[822]: Failed to read display number from pipe
Jun 07 06:45:51 elune sddm[822]: Display server stopping...
Jun 07 06:45:51 elune sddm[822]: Attempt 2 starting the Display server on vt 1 failed
Jun 07 06:45:53 elune sddm[822]: Display server starting...
Jun 07 06:45:53 elune sddm[822]: Adding cookie to "/var/run/sddm/{f39b69c8-61ee-483d-b96a-e897e821922b}"
Jun 07 06:45:53 elune sddm[822]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt1 -auth /var/run/sddm/{f39b69c8-61ee-483d-b96a-e897e821922b} -noreset -displayfd 17
Jun 07 06:45:53 elune sddm[822]: Failed to read display number from pipe
Jun 07 06:45:53 elune sddm[822]: Display server stopping...
Jun 07 06:45:53 elune sddm[822]: Attempt 3 starting the Display server on vt 1 failed
Jun 07 06:45:53 elune sddm[822]: Could not start Display server on vt 1

These lines seemed particularly suspicious:

Jun 07 06:45:42 elune systemd-modules-load[407]: Module 'nvidia_uvm' is deny-listed
Jun 07 06:45:42 elune audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 07 06:45:42 elune systemd-modules-load[407]: Module 'nvidia_drm' is deny-listed
Jun 07 06:45:42 elune systemd-modules-load[407]: Module 'nvidia_uvm' is deny-listed

It seems to be trying to load nvidia stuff, but there's a bash bit that fails, not sure how relevant that is:

Jun 07 06:45:46 elune python3[620]: [2725] INFO: Loading module nvidia
Jun 07 06:45:46 elune kernel: nvidia: module license 'NVIDIA' taints kernel.
Jun 07 06:45:46 elune kernel: Disabling lock debugging due to kernel taint
Jun 07 06:45:46 elune kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 507
Jun 07 06:45:46 elune kernel:
Jun 07 06:45:46 elune kernel: nvidia 0000:01:00.0: enabling device (0006 -> 0007)
Jun 07 06:45:46 elune kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Jun 07 06:45:46 elune kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 515.48.07 Fri May 27 03:26:43 UTC 2022
Jun 07 06:45:46 elune python3[620]: [2908] INFO: Loading module nvidia_drm
Jun 07 06:45:46 elune systemd-udevd[429]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) 255'' failed with exit code 1.
Jun 07 06:45:46 elune kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 515.48.07 Fri May 27 03:18:00 UTC 2022
Jun 07 06:45:46 elune kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver

[...]

Jun 07 06:45:47 elune systemd-udevd[478]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \ -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) ${i}; done'' failed with exit code 1.

[...]

Jun 07 06:45:48 elune kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
Jun 07 06:45:48 elune systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:nvidia_0...
Jun 07 06:45:48 elune systemd[1]: Starting autorandr execution hook...
Jun 07 06:45:48 elune systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:nvidia_0.
Jun 07 06:45:48 elune audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='[email protected]acklight:nvidia_0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 07 06:45:48 elune kernel: audit: type=1130 audit(1654580748.107:66): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='[email protected]:nvidia_0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 07 06:45:48 elune systemd[1]: autorandr.service: Deactivated successfully.
Jun 07 06:45:48 elune systemd[1]: Finished autorandr execution hook.

EDIT: Full output here.

It looks like something is failing from optimus-manager scripts. I cannot understand the actual problem, but I suppose optimus manager developers would be the best to understand from their own code logs.
I suggest you ask at their support web page, or try another nvidia configuration without optimus manager.

1 Like

I'll ask them for help then and report back what I find.

Hopefully I can paste a snippet of commands needed to fix this once I figure this out :stuck_out_tongue:

Not ready to give up on Garuda yet!

1 Like