Random crash with log PCIe Bus Error

My system creashes randomly, but more frequently when playing videos on web browser. In the last cash, I got the blow error:

4月 09 07:54:30 renkai-m600 kernel: r8169 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
4月 09 07:54:30 renkai-m600 kernel: r8169 0000:02:00.0:   device [10ec:8125] error status/mask=00001081/0000e000
4月 09 07:54:30 renkai-m600 kernel: r8169 0000:02:00.0:    [ 0] RxErr                  (First)
4月 09 07:54:30 renkai-m600 kernel: r8169 0000:02:00.0:    [ 7] BadDLLP               
4月 09 07:54:30 renkai-m600 kernel: r8169 0000:02:00.0:    [12] Timeout               
4月 09 07:54:44 renkai-m600 kernel: pcieport 0000:00:01.3: AER: Multiple Corrected error received: 0000:00:01.3
4月 09 07:54:44 renkai-m600 kernel: pcieport 0000:00:01.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
4月 09 07:54:44 renkai-m600 kernel: pcieport 0000:00:01.3:   device [1022:14ba] error status/mask=00000040/00006000
4月 09 07:54:44 renkai-m600 kernel: pcieport 0000:00:01.3:    [ 6] BadTLP                
4月 09 07:54:44 renkai-m600 kernel: pcieport 0000:00:01.3: AER:   Error of this Agent is reported first
4月 09 07:54:44 renkai-m600 kernel: r8169 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
4月 09 07:54:44 renkai-m600 kernel: r8169 0000:02:00.0:   device [10ec:8125] error status/mask=00001081/0000e000
4月 09 07:54:44 renkai-m600 kernel: r8169 0000:02:00.0:    [ 0] RxErr                  (First)
4月 09 07:54:44 renkai-m600 kernel: r8169 0000:02:00.0:    [ 7] BadDLLP               
4月 09 07:54:44 renkai-m600 kernel: r8169 0000:02:00.0:    [12] Timeout               
4月 09 07:54:45 renkai-m600 kernel: pcieport 0000:00:01.3: AER: Multiple Corrected error received: 0000:00:01.3
4月 09 07:54:45 renkai-m600 kernel: pcieport 0000:00:01.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
4月 09 07:54:45 renkai-m600 kernel: pcieport 0000:00:01.3:   device [1022:14ba] error status/mask=00000040/00006000
4月 09 07:54:45 renkai-m600 kernel: pcieport 0000:00:01.3:    [ 6] BadTLP                
4月 09 07:54:45 renkai-m600 kernel: pcieport 0000:00:01.3: AER:   Error of this Agent is reported first
4月 09 07:54:45 renkai-m600 kernel: r8169 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
4月 09 07:54:45 renkai-m600 kernel: r8169 0000:02:00.0:   device [10ec:8125] error status/mask=00001001/0000e000
4月 09 07:54:45 renkai-m600 kernel: r8169 0000:02:00.0:    [ 0] RxErr                  (First)
4月 09 07:54:45 renkai-m600 kernel: r8169 0000:02:00.0:    [12] Timeout               
4月 09 07:54:56 renkai-m600 kernel: pcieport 0000:00:01.3: AER: Multiple Corrected error received: 0000:00:01.3
4月 09 07:54:56 renkai-m600 kernel: pcieport 0000:00:01.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
4月 09 07:54:56 renkai-m600 kernel: pcieport 0000:00:01.3:   device [1022:14ba] error status/mask=00000040/00006000
4月 09 07:54:56 renkai-m600 kernel: pcieport 0000:00:01.3:    [ 6] BadTLP                
4月 09 07:54:56 renkai-m600 kernel: pcieport 0000:00:01.3: AER:   Error of this Agent is reported first
4月 09 07:54:56 renkai-m600 kernel: r8169 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
4月 09 07:54:56 renkai-m600 kernel: r8169 0000:02:00.0:   device [10ec:8125] error status/mask=00001001/0000e000
4月 09 07:54:56 renkai-m600 kernel: r8169 0000:02:00.0:    [ 0] RxErr                  (First)
4月 09 07:54:56 renkai-m600 kernel: r8169 0000:02:00.0:    [12] Timeout               
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3: AER: Multiple Corrected error received: 0000:00:01.3
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3:   device [1022:14ba] error status/mask=000000c0/00006000
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3:    [ 6] BadTLP                
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3:    [ 7] BadDLLP               
4月 09 07:55:00 renkai-m600 kernel: pcieport 0000:00:01.3: AER:   Error of this Agent is reported first


I tried searching, and with some advise, I set pci=nomsi in grub, but it does not help. I guess it's some driver error with a specific device, but can't tell which through the log.

System:
  Kernel: 6.2.10-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.1
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=2f5dc791-c541-4c17-ad3d-e4fedb1a2e4f rw rootflags=subvol=@
    quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
    loglevel=3
  Desktop: GNOME v: 43.4 tk: GTK v: 3.24.37 wm: gnome-shell dm: GDM v: 44.0
    Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop System: Win element product: M600 v: N/A
    serial: <superuser required>
  Mobo: Win element model: M600 serial: <superuser required> UEFI: American
    Megatrends LLC. v: SR500P01_P7C2V09 date: 12/27/2022
Battery:
  Device-1: hidpp_battery_0 model: Logitech ERGO M575 Trackball
    serial: <filter> charge: 100% rechargeable: yes status: discharging
CPU:
  Info: model: AMD Ryzen 9 6900HX with Radeon Graphics bits: 64 type: MT MCP
    arch: Zen 3+ gen: 4 level: v3 note: check built: 2022 process: TSMC n6 (7nm)
    family: 0x19 (25) model-id: 0x44 (68) stepping: 1 microcode: 0xA404102
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
    L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
    L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 1569 high: 1946 min/max: 1600/4934 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 1397 2: 1600
    3: 1600 4: 1600 5: 1600 6: 1600 7: 1367 8: 1600 9: 1600 10: 1600 11: 1600
    12: 1600 13: 1397 14: 1397 15: 1600 16: 1946 bogomips: 105402
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities: <filter>
Graphics:
  Device-1: AMD Rembrandt [Radeon 680M] driver: amdgpu v: kernel arch: RDNA-2
    code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: DP-2 empty: DP-1, DP-3, DP-4,
    DP-5, DP-6, HDMI-A-1 bus-ID: 75:00.0 chip-ID: 1002:1681 class-ID: 0300
    temp: 45.0 C
  Device-2: Logitech Webcam C930e type: USB driver: snd-usb-audio,uvcvideo
    bus-ID: 5-1.1:3 chip-ID: 046d:0843 class-ID: 0102 serial: <filter>
  Display: wayland server: X.org v: 1.21.1.8 with: Xwayland v: 23.1.1
    compositor: gnome-shell driver: gpu: amdgpu display-ID: 0
  Monitor-1: DP-2 model: AOC LV273HUPR serial: <filter> built: 2020
    res: 3840x2160 dpi: 163 gamma: 1.2 size: 597x336mm (23.5x13.23")
    diag: 685mm (27") ratio: 16:9 modes: max: 3840x2160 min: 720x400
  API: EGL/GBM Message: No known Wayland EGL/GBM data sources.
Audio:
  Device-1: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
    v: kernel pcie: bus-ID: 5-1.1:3 chip-ID: 046d:0843 gen: 4 speed: 16 GT/s
    class-ID: 0102 lanes: 16 serial: <filter> bus-ID: 75:00.1
    chip-ID: 1002:1640 class-ID: 0403
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor driver: snd_pci_acp6x
    v: kernel alternate: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x,
    snd_acp_pci, snd_rpl_pci_acp6x, snd_pci_ps, snd_sof_amd_renoir,
    snd_sof_amd_rembrandt pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 75:00.5 chip-ID: 1022:15e2 class-ID: 0480
  Device-3: AMD Family 17h/19h HD Audio vendor: Realtek
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 75:00.6 chip-ID: 1022:15e3 class-ID: 0403
  Device-4: Logitech Webcam C930e type: USB driver: snd-usb-audio,uvcvideo
  API: ALSA v: k6.2.10-zen1-1-zen status: kernel-api tools: N/A
  Server-1: PipeWire v: 0.3.68 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8125 2.5GbE driver: r8169 v: kernel pcie: gen: 2
    speed: 5 GT/s lanes: 1 port: f000 bus-ID: 02:00.0 chip-ID: 10ec:8125
    class-ID: 0200
  IF: enp2s0 state: down mac: <filter>
  Device-2: Realtek RTL8125 2.5GbE driver: r8169 v: kernel pcie: gen: 2
    speed: 5 GT/s lanes: 1 port: e000 bus-ID: 03:00.0 chip-ID: 10ec:8125
    class-ID: 0200
  Device-3: MEDIATEK MT7921K Wi-Fi 6E 80MHz driver: mt7921e v: kernel pcie:
    gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 04:00.0 chip-ID: 14c3:0608
    class-ID: 0280
  IF: wlp4s0 state: up mac: <filter>
  IF-ID-1: eno1 state: down mac: <filter>
  IF-ID-2: utun state: unknown speed: 10000 Mbps duplex: full mac: N/A
Bluetooth:
  Device-1: MediaTek Wireless_Device type: USB driver: btusb v: 0.8
    bus-ID: 2-3:2 chip-ID: 0e8d:0608 class-ID: e001 serial: <filter>
  Report: bt-adapter ID: hci0 rfk-id: 0 state: up address: <filter>
Drives:
  Local Storage: total: 1.82 TiB used: 12.15 GiB (0.7%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Crucial model: CT2000P3PSSD8
    size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
    lanes: 4 type: SSD serial: <filter> rev: P9CR40A temp: 50.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 1.82 TiB size: 1.82 TiB (100.00%) used: 12.15 GiB (0.7%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 608 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 1.82 TiB size: 1.82 TiB (100.00%)
    used: 12.15 GiB (0.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-4: /var/log raw-size: 1.82 TiB size: 1.82 TiB (100.00%)
    used: 12.15 GiB (0.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-5: /var/tmp raw-size: 1.82 TiB size: 1.82 TiB (100.00%)
    used: 12.15 GiB (0.7%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: zram size: 62.04 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 50.2 C mobo: N/A gpu: amdgpu temp: 49.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 383 Uptime: 27m wakeups: 15875 Memory: 62.04 GiB
  used: 4.15 GiB (6.7%) Init: systemd v: 253 default: graphical
  tool: systemctl Compilers: gcc: 12.2.1 Packages: pm: pacman pkgs: 1192
  libs: 348 tools: octopi,pamac,paru Shell: fish v: 3.6.1 default: Bash
  v: 5.1.16 running-in: gnome-terminal inxi: 3.3.26
Garuda (2.6.16-1):
  System install date:     2023-03-17
  Last full system update: 2023-04-09 ↻
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager dracut
  Windows dual boot:       Probably (Run as root to verify)
  Failed units:            

Those messages mention r8169, a driver for Realtek network chips.
No idea if that is the cause of crashes though.
Found probes on linux-hardware.org about the M600 reporting various problems with it.
Maybe the r8125-dkms package works.

1 Like

Thanks a lot! But it's the first time I know the website linux-hardware.org. I event don't find a search button in that forum. Do you have simple introduction of how to use it?

Maybe I have not explain my question clearly, I would write some details below.

In the specific page I see the status of the hardware 8125 is detected(the device is detected, but not tested yet) and with an alert mark(The device mode is known to have problems).

By clicking the alert mark, I expected to get something like other peoples discuss about this hardware, but it's the hardwares index page I guess. Realtek Semiconductor RTL8125 2.5GbE Controller

The page have a link Discuss this device on our forum., I guess there are discussions in the forum. But it's also where I said I can't find a search button.

By the way, I think installing this package more or less works. I encountered a crash after about a 40 minutes video watching, usually it happens in less than 20 minutes.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.