Possible Nvidia Driver Issue After Update

Hello,

I'm having an issue with a system update I did last night, and unfortunately in my attempts to sort it out it ended up doing too many snapper overwrites so my pre-update rollback is gone (I think?), oops...

The issue I'm having is that my Nvidia driver appears to be having some issues. You'll notice at the bottom of the Garuda Inxi output I have a failed systemd unit, this is just an automated script I run to set the power limit of my 3070 Mobile on boot to 64W. Now what happens if I run the command to do that manually is I get this message:

 ╭─[email protected] in ~ took 19s
 ╰─λ nvidia-smi -pl 64
Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

If I run nvidia-smi alone my power usage will say something like 11W / N/A when before it would say 11W / 64W.

I'm wondering if there is a way to downgrade the Nvidia driver back to 525, or if there is anything else I can do.

The steps I've taken so far to try and fix it are seeing if there were any more updates today and installing a couple of kernels (LTS and Xanmod) using the Garuda Settings tool to see if that would fix it, but they haven't worked.

Also, not sure if it matters at all but I don't dual-boot!

Thanks!

Garuda Inxi:

System:
  Kernel: 6.2.8-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.1
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=3ee9b05a-af77-4cbf-bd94-85b0090437f6 rw [email protected]
    quiet
    cryptdevice=UUID=3ac6c728-7c3c-43bd-beba-3838be95e33f:luks-3ac6c728-7c3c-43bd-beba-3838be95e33f
    root=/dev/mapper/luks-3ac6c728-7c3c-43bd-beba-3838be95e33f quiet splash
    rd.udev.log_priority=3 vt.global_cursor_default=0
    resume=/dev/mapper/luks-558a2aec-5e05-4637-ab15-eeda4d0bcac0 loglevel=3
    ibt=off
  Desktop: GNOME v: 43.3 tk: GTK v: 3.24.37 wm: gnome-shell dm: GDM v: 43.0
    Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Laptop System: ASUSTeK product: ASUS TUF Gaming A15 FA507RR_FA507RR
    v: 1.0 serial: <superuser required>
  Mobo: ASUSTeK model: FA507RR v: 1.0 serial: <superuser required>
    UEFI: American Megatrends LLC. v: FA507RR.314 date: 10/12/2022
Battery:
  ID-1: BAT1 charge: 85.2 Wh (100.0%) condition: 85.2/90.2 Wh (94.4%)
    volts: 17.3 min: 15.9 model: ASUS A32-K55 type: Li-ion serial: N/A
    status: full
CPU:
  Info: model: AMD Ryzen 7 6800H with Radeon Graphics bits: 64 type: MT MCP
    arch: Zen 3+ gen: 4 level: v3 note: check built: 2022 process: TSMC n6 (7nm)
    family: 0x19 (25) model-id: 0x44 (68) stepping: 1 microcode: 0xA404101
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
    L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
    L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 1678 high: 3200 min/max: 1600/4784 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 1377 2: 1600
    3: 1338 4: 1600 5: 2070 6: 1731 7: 1596 8: 1600 9: 1600 10: 1677 11: 3200
    12: 1330 13: 1600 14: 1600 15: 1600 16: 1331 bogomips: 102204
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities: <filter>
Graphics:
  Device-1: NVIDIA GA104M [GeForce RTX 3070 Mobile / Max-Q] vendor: ASUSTeK
    driver: nvidia v: 530.41.03 alternate: nouveau,nvidia_drm non-free: 525.xx+
    status: current (as of 2023-02) arch: Ampere code: GAxxx
    process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 8
    link-max: lanes: 16 bus-ID: 01:00.0 chip-ID: 10de:249d class-ID: 0300
  Device-2: AMD Rembrandt [Radeon 680M] vendor: ASUSTeK driver: amdgpu
    v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22
    pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: eDP-1 empty: DP-1,
    DP-2, DP-3, DP-4, DP-5, DP-6, DP-7, DP-8 bus-ID: 06:00.0
    chip-ID: 1002:1681 class-ID: 0300 temp: 53.0 C
  Device-3: Sonix USB2.0 HD UVC WebCam type: USB driver: uvcvideo
    bus-ID: 1-4:2 chip-ID: 322e:202c class-ID: 0e02
  Display: x11 server: X.Org v: 21.1.7 with: Xwayland v: 23.1.0
    compositor: gnome-shell driver: X: loaded: amdgpu,nvidia
    unloaded: modesetting,nouveau,radeon alternate: fbdev,nv,vesa
    dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: eDP-1 mapped: eDP model: TL156VDXP0101 built: 2021
    res: 1920x1080 hz: 300 dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64")
    diag: 395mm (15.5") ratio: 16:9 modes: max: 1920x1080 min: 640x480
  API: OpenGL v: 4.6 Mesa 23.0.0 renderer: AMD Radeon Graphics (rembrandt
    LLVM 15.0.7 DRM 3.49 6.2.8-zen1-1-zen) direct-render: Yes
Audio:
  Device-1: NVIDIA GA104 High Definition Audio vendor: ASUSTeK
    driver: snd_hda_intel bus-ID: 4-1:2 v: kernel chip-ID: 0b05:6208 pcie:
    class-ID: 0300 gen: 4 speed: 16 GT/s lanes: 8 link-max: lanes: 16
    bus-ID: 01:00.1 chip-ID: 10de:228b class-ID: 0403
  Device-2: ASUSTek C-Media Audio type: USB
    driver: hid-generic,snd-usb-audio,usbhid
  Sound API: ALSA v: k6.2.8-zen1-1-zen running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.67 running: yes
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: ASUSTeK driver: r8168 v: 8.051.02-NAPI modules: r8169 pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 03:00.0 chip-ID: 10ec:8168
    class-ID: 0200
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-2: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter
    vendor: AzureWave driver: mt7921e v: kernel pcie: gen: 2 speed: 5 GT/s
    lanes: 1 bus-ID: 04:00.0 chip-ID: 14c3:7961 class-ID: 0280
  IF: wlp4s0 state: down mac: <filter>
  IF-ID-1: wg-mullvad state: unknown speed: N/A duplex: N/A mac: N/A
Bluetooth:
  Device-1: IMC Networks Wireless_Device type: USB driver: btusb v: 0.8
    bus-ID: 3-3:3 chip-ID: 13d3:3563 class-ID: e001 serial: <filter>
  Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
Drives:
  Local Storage: total: 2.75 TiB used: 734.73 GiB (26.1%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
    model: WD Blue SN570 2TB size: 1.82 TiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: 234200WD temp: 36.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:2 vendor: Samsung
    model: MZVLQ1T0HBLB-00B00 size: 953.87 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: FXM7301Q temp: 27.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 919.9 GiB size: 919.9 GiB (100.00%)
    used: 136.88 GiB (14.9%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-3ac6c728-7c3c-43bd-beba-3838be95e33f
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 752 KiB (0.2%) fs: vfat dev: /dev/nvme1n1p1 maj-min: 259:3
  ID-3: /home raw-size: 919.9 GiB size: 919.9 GiB (100.00%)
    used: 136.88 GiB (14.9%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-3ac6c728-7c3c-43bd-beba-3838be95e33f
  ID-4: /var/log raw-size: 919.9 GiB size: 919.9 GiB (100.00%)
    used: 136.88 GiB (14.9%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-3ac6c728-7c3c-43bd-beba-3838be95e33f
  ID-5: /var/tmp raw-size: 919.9 GiB size: 919.9 GiB (100.00%)
    used: 136.88 GiB (14.9%) fs: btrfs dev: /dev/dm-0 maj-min: 254:0
    mapped: luks-3ac6c728-7c3c-43bd-beba-3838be95e33f
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 33.66 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/dm-1 maj-min: 254:1
    mapped: luks-558a2aec-5e05-4637-ab15-eeda4d0bcac0
  ID-2: swap-2 type: zram size: 30.6 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 61.1 C mobo: N/A gpu: amdgpu temp: 53.0 C
  Fan Speeds (RPM): cpu: 0
Info:
  Processes: 404 Uptime: 11m wakeups: 929 Memory: 30.6 GiB
  used: 3.1 GiB (10.1%) Init: systemd v: 253 default: graphical
  tool: systemctl Compilers: gcc: 12.2.1 clang: 15.0.7 Packages: pm: pacman
  pkgs: 1500 libs: 505 tools: pamac,paru Shell: fish v: 3.6.1 default: Bash
  v: 5.1.16 running-in: gnome-terminal inxi: 3.3.25
Garuda (2.6.16-1):
  System install date:     2023-02-28
  Last full system update: 2023-03-25
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager mkinitcpio nvidia-dkms
  Windows dual boot:       Probably (Run as root to verify)
  Failed units:            nvidia-power-limit.service 

could you check the files in /var/log/garuda and upload the one from the troublesome system update? There is probably one package that can be rolled back to get back to normal.

There you go, there's quite a lot of packages in the update :sweat_smile:

Sometimes I have issues with NVidia drivers after update.

You can use nvidia-all to re-install or install a specific version.

And yes you can downgrade with it.

1 Like

Next time please use the text bin on the top right :wink:

I spotted these nvidia packages:

lib32-nvidia-utils-530.41.03-1
lib32-opencl-nvidia-530.41.03-1 
nvidia-dkms-530.41.03-1
nvidia-settings-530.41.03-1 
nvidia-utils-530.41.03-1
opencl-nvidia-530.41.03-1

There are two packages that immediately pop out: nvidia-utils and nvidia-dkms. nvidia-utils contains the tool you use for power-management and nvidia-dkms is the kernel module.

Since the kernel module tends to manage things like this, I think you should start to downgrade that to the previous version. After that, maybe try nvidia-utils.

3 Likes

Thank you! I ended up putting all those packages you listed into downgrade and selected the cached 525 version of them all, and it's worked! Thanks again :grinning:

You should report the issue to nvidia, or you can never upgrade those packages again

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.