A lot of nvidia error after playing a game

i don’t know when it started but when i look at the journal error, i have several time the errors:

kernel: nvidia 0000:01:00.0: AER: Error of this Agent is reported first

and

kernel: pcieport 0000:00:01.1: AER: Error of this Agent is reported first

I tried to reinstall the driver but still get those errors. Is it fine to ignore them or is there a big problem that i should resolve?

garuda-inxi
System:
Kernel: 6.8.1-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc avail: acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=03cf0922-86bd-45b4-984f-4e0416a0ef5d rw rootflags=subvol=@
quiet loglevel=3 nvidia-drm.modeset=1 nvidia_drm.fbdev=1 ibt=off
Desktop: KDE Plasma v: 6.0.2 tk: Qt v: N/A info: frameworks v: 6.0.0
wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
Type: Laptop System: ASUSTeK product: ASUS TUF Gaming A15 FA507NV_FA507NV
v: 1.0 serial: <superuser required>
Mobo: ASUSTeK model: FA507NV v: 1.0 serial: <superuser required>
uuid: <superuser required> UEFI: American Megatrends LLC. v: FA507NV.312
date: 08/11/2023
Battery:
ID-1: BAT1 charge: 67.2 Wh (79.5%) condition: 84.5/90.2 Wh (93.7%)
volts: 16.4 min: 15.9 model: ASUS A32-K55 type: Li-ion serial: N/A
status: not charging
CPU:
Info: model: AMD Ryzen 7 7735HS with Radeon Graphics bits: 64 type: MT MCP
arch: Zen 3+ gen: 4 level: v3 note: check built: 2022 process: TSMC n6 (7nm)
family: 0x19 (25) model-id: 0x44 (68) stepping: 1 microcode: 0xA404102
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
L3: 16 MiB desc: 1x16 MiB
Speed (MHz): avg: 1149 high: 2392 min/max: 400/4829 scaling:
driver: amd-pstate-epp governor: powersave cores: 1: 1365 2: 1369 3: 400
4: 1358 5: 2392 6: 2389 7: 400 8: 400 9: 1355 10: 400 11: 400 12: 400
13: 1369 14: 400 15: 2392 16: 1598 bogomips: 102207
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile] vendor: ASUSTeK
driver: nvidia v: 550.67 alternate: nouveau,nvidia_drm non-free: 545.xx+
status: current (as of 2024-02) arch: Lovelace code: AD1xx
process: TSMC n4 (5nm) built: 2022+ pcie: gen: 4 speed: 16 GT/s lanes: 8
ports: active: none empty: DP-1,HDMI-A-1,eDP-1 bus-ID: 01:00.0
chip-ID: 10de:28e0 class-ID: 0300
Device-2: AMD Rembrandt [Radeon 680M] vendor: ASUSTeK driver: amdgpu
v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22
pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: eDP-2 empty: DP-2,
DP-3, DP-4, DP-5, DP-6, DP-7, DP-8, DP-9, Writeback-1 bus-ID: 35:00.0
chip-ID: 1002:1681 class-ID: 0300 temp: 43.0 C
Device-3: Sonix USB2.0 HD UVC WebCam driver: uvcvideo type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-4:2 chip-ID: 2b7e:b685
class-ID: 0e02
Display: wayland server: X.org v: 1.21.1.11 with: Xwayland v: 23.2.4
compositor: kwin_wayland driver: X: loaded: amdgpu,nvidia
unloaded: modesetting,nouveau alternate: fbdev,nv,vesa dri: radeonsi
gpu: nvidia,amdgpu display-ID: 0
Monitor-1: eDP-2 res: 1920x1080 size: N/A modes: N/A
API: EGL v: 1.5 hw: drv: nvidia drv: amd radeonsi platforms: device: 0
drv: nvidia device: 2 drv: radeonsi device: 3 drv: swrast gbm: drv: nvidia
surfaceless: drv: nvidia wayland: drv: radeonsi x11: drv: radeonsi
inactive: device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: amd mesa v: 24.0.3-arch1.2
glx-v: 1.4 direct-render: yes renderer: AMD Radeon Graphics (radeonsi
rembrandt LLVM 17.0.6 DRM 3.57 6.8.1-zen1-1-zen) device-ID: 1002:1681
memory: 500 MiB unified: no display-ID: :1.0
API: Vulkan v: 1.3.279 layers: 10 device: 0 type: integrated-gpu name: AMD
Radeon Graphics (RADV REMBRANDT) driver: mesa radv v: 24.0.3-arch1.2
device-ID: 1002:1681 surfaces: xcb,xlib,wayland device: 1
type: discrete-gpu name: NVIDIA GeForce RTX 4060 Laptop GPU driver: nvidia
v: 550.67 device-ID: 10de:28e0 surfaces: xcb,xlib,wayland device: 2
type: cpu name: llvmpipe (LLVM 17.0.6 256 bits) driver: mesa llvmpipe
v: 24.0.3-arch1.2 (LLVM 17.0.6) device-ID: 10005:0000
surfaces: xcb,xlib,wayland
Audio:
Device-1: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel pcie:
gen: 4 speed: 16 GT/s lanes: 8 bus-ID: 01:00.1 chip-ID: 10de:22be
class-ID: 0403
Device-2: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 35:00.1
chip-ID: 1002:1640 class-ID: 0403
Device-3: AMD ACP/ACP3X/ACP6x Audio Coprocessor vendor: ASUSTeK
driver: snd_pci_acp6x v: kernel alternate: snd_pci_acp3x, snd_rn_pci_acp3x,
snd_pci_acp5x, snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps,
snd_sof_amd_renoir, snd_sof_amd_rembrandt, snd_sof_amd_vangogh,
snd_sof_amd_acp63 pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 35:00.5
chip-ID: 1022:15e2 class-ID: 0480
Device-4: AMD Family 17h/19h HD Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 35:00.6 chip-ID: 1022:15e3 class-ID: 0403
Device-5: Kingston HyperX 7.1 Audio
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 2.0 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 3-1:4 chip-ID: 0951:16a4 class-ID: 0300
serial: <filter>
API: ALSA v: k6.8.1-zen1-1-zen status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 1.0.4 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
vendor: ASUSTeK RTL8111/8168/8411 driver: r8169 v: kernel pcie: gen: 1
speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 02:00.0 chip-ID: 10ec:8168
class-ID: 0200
IF: eno1 state: down mac: <filter>
Device-2: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter
vendor: AzureWave driver: mt7921e v: kernel pcie: gen: 2 speed: 5 GT/s
lanes: 1 bus-ID: 03:00.0 chip-ID: 14c3:7961 class-ID: 0280
IF: wlp3s0 state: up mac: <filter>
Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: IMC Networks Wireless_Device driver: btusb v: 0.8 type: USB
rev: 2.1 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-3:3 chip-ID: 13d3:3563
class-ID: e001 serial: <filter>
Report: btmgmt ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
rfk-block: hardware: no software: yes address: <filter> bt-v: 5.2 lmp-v: 11
status: discoverable: no pairing: no
Drives:
Local Storage: total: 1.38 TiB used: 328.69 GiB (23.3%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Micron model: 2400 MTFDKBA512QFM
size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: V3MA003 temp: 31.9 C
scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Samsung model: PSSD T7
size: 931.51 GiB block-size: physical: 512 B logical: 512 B type: USB
rev: 3.2 spd: 10 Gb/s lanes: 1 mode: 3.2 gen-2x1 tech: SSD
serial: <filter> scheme: MBR
Partition:
ID-1: / raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 44.39 GiB (9.3%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 584 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 44.39 GiB (9.3%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-4: /var/log raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 44.39 GiB (9.3%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-5: /var/tmp raw-size: 476.64 GiB size: 476.64 GiB (100.00%)
used: 44.39 GiB (9.3%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 14.87 GiB used: 0 KiB (0.0%) priority: 100
comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 16 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 44.0 C mobo: 32.0 C gpu: amdgpu temp: 42.0 C
Fan Speeds (rpm): cpu: 2200
Info:
Memory: total: 16 GiB note: est. available: 14.87 GiB used: 3.28 GiB (22.1%)
Processes: 352 Power: uptime: 3h 27m states: freeze,mem,disk
suspend: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 5.91 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 255 default: graphical
tool: systemctl
Packages: pm: pacman pkgs: 1374 libs: 406 tools: octopi,paru Compilers:
gcc: 13.2.1 Shell: garuda-inxi default: fish v: 3.7.0 running-in: konsole
inxi: 3.3.33
Garuda (2.6.23-1):
System install date:     2023-11-25
Last full system update: 2024-03-25
Is partially upgraded:   No
Relevant software:       snapper NetworkManager dracut nvidia-dkms
Windows dual boot:       Probably (Run as root to verify)
Failed units:

Do you get the same error if you select an x11 session in the login screen?

1 Like

just started x11 session, games don’t even launch, they crash immediatly

Maybe you could try booting with the pcie_aspm=off kernel boot parameter, just as a test, since I don’t know the possible side effects.
In my opinion it is safe to leave things as they are, as long as you don’t see any real problem other than those messages in the journal.

1 Like

There are no problem, i just saw them when i checked the journal but games run fine, i just don’t like having errors unless it’s safe to ignore them.

But is it normal that it spam the journal every 2 minutes?

It seems it was the solution, i ddon’t get those error anymore so i mark it as solution as long as the error won’t come back.

Is it better to keep it on and ignore the errors or keep it off to avoid the errors ?

Well, it is a complex power management option. Not easy to say.
https://wiki.archlinux.org/title/Power_management#Active_State_Power_Management
I’ve seen it suggested and used many times.
You will find a lot of discussions on the internet about it.
I don’t think it should cause troubles.
Maybe just keep an eye to the overall system performance, and if you don’t have any drawback after using it for a while, I’d say it’s good to go…

1 Like

I don’t understand, do you mean i better keep it on and ignore the errors as long as there are no problem? So i should remove the kernel parameter ?

No, I was suggesting to keep the parameter for a while and see how it goes.

1 Like

Ok thank you, if i understand correctly, it’s just a powersaving mode that disable pcie if not use and reactivate it when using it?

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.