System freezes randomly, NVME issues?

Hello everyone, i need help.System freezes randomly, it happens right after the boot, even on a login screen, after some ours, but NEVER when some game is running.

This is garuda-inxi

System:
  Kernel: 6.0.12-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=fdf04799-bd3a-43cc-9201-22338b358559 rw [email protected]
    quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
    resume=UUID=489310f4-be41-444c-9b27-634c56e27576 loglevel=3 ibt=off
  Desktop: Cinnamon v: 5.6.4 tk: GTK v: 3.24.35 wm: muffin vt: 7 dm: LightDM
    v: 1.32.0 Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop Mobo: ASUSTeK model: ROG STRIX B450-F GAMING II v: Rev 1.xx
    serial: <superuser required> UEFI: American Megatrends v: 4901
    date: 07/25/2022
CPU:
  Info: model: AMD Ryzen 7 5800X bits: 64 type: MT MCP arch: Zen 3 gen: 4
    level: v3 note: check built: 2021-22 process: TSMC n7 (7nm)
    family: 0x19 (25) model-id: 0x21 (33) stepping: 0 microcode: 0xA201016
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
    L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
    L3: 32 MiB desc: 1x32 MiB
  Speed (MHz): avg: 2434 high: 3390 min/max: 2200/4850 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 3390 2: 2200
    3: 2200 4: 2200 5: 2854 6: 2835 7: 2195 8: 2193 9: 2196 10: 2200 11: 2200
    12: 2857 13: 2200 14: 2184 15: 2850 16: 2192 bogomips: 121367
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: retbleed status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
    STIBP: always-on, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M]
    vendor: ASUSTeK driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
    process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s
    lanes: 16 ports: active: DP-2 empty: DP-1,DP-3,HDMI-A-1 bus-ID: 0c:00.0
    chip-ID: 1002:73df class-ID: 0300
  Display: x11 server: X.Org v: 21.1.4 driver: X: loaded: amdgpu
    unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi gpu: amdgpu
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: DP-2 mapped: DisplayPort-1 model: LG (GoldStar) MP59G
    serial: <filter> built: 2016 res: 1920x1080 hz: 75 dpi: 102 gamma: 1.2
    size: 480x270mm (18.9x10.63") diag: 690mm (27.2") ratio: 16:9 modes:
    max: 1920x1080 min: 640x480
  API: OpenGL v: 4.6 Mesa 22.2.3 renderer: AMD Radeon RX 6700 XT (navi22
    LLVM 14.0.6 DRM 3.48 6.0.12-zen1-1-zen) direct render: Yes
Audio:
  Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
    gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 0c:00.1 chip-ID: 1002:ab28
    class-ID: 0403
  Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 0e:00.4 chip-ID: 1022:1487 class-ID: 0403
  Sound API: ALSA v: k6.0.12-zen1-1-zen running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.61 running: yes
Network:
  Device-1: Intel I211 Gigabit Network vendor: ASUSTeK driver: igb v: kernel
    pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 04:00.0
    chip-ID: 8086:1539 class-ID: 0200
  IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 1.12 TiB used: 93 GiB (8.1%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Seagate model: XPG GAMMIX S5
    size: 238.47 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 type: SSD serial: <filter> rev: VC0S032V temp: 31.9 C scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: SanDisk model: SSD PLUS 480GB
    size: 447.14 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 00RL scheme: GPT
  ID-3: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST500DM002-1BD142
    size: 465.76 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: HDD rpm: 7200 serial: <filter> rev: KC48 scheme: MBR
Partition:
  ID-1: / raw-size: 203.8 GiB size: 203.8 GiB (100.00%) used: 93 GiB (45.6%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 608 KiB (0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 203.8 GiB size: 203.8 GiB (100.00%)
    used: 93 GiB (45.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-4: /var/log raw-size: 203.8 GiB size: 203.8 GiB (100.00%)
    used: 93 GiB (45.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-5: /var/tmp raw-size: 203.8 GiB size: 203.8 GiB (100.00%)
    used: 93 GiB (45.6%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 34.37 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/nvme0n1p3 maj-min: 259:3
  ID-2: swap-2 type: zram size: 31.24 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 39.0 C mobo: 33.0 C gpu: amdgpu temp: 32.0 C
    mem: 30.0 C
  Fan Speeds (RPM): cpu: 658 case-1: 617 case-2: 0 case-3: 662 gpu: amdgpu
    fan: 0
  Power: 12v: 9.92 5v: N/A 3.3v: N/A vbat: 3.18 gpu: amdgpu watts: 7.00
Info:
  Processes: 394 Uptime: 16m wakeups: 0 Memory: 31.24 GiB
  used: 5.19 GiB (16.6%) Init: systemd v: 252 default: graphical
  tool: systemctl Compilers: gcc: 12.2.0 Packages: pm: pacman pkgs: 1226
  libs: 342 tools: paru Shell: fish v: 3.5.1 default: Bash v: 5.1.16
  running-in: gnome-terminal inxi: 3.3.24
Garuda (2.6.10-1):
  System install date:     2022-12-01
  Last full system update: 2022-12-13
  Is partially upgraded:   No
  Relevant software:       NetworkManager
  Windows dual boot:       Probably (Run as root to verify)
  Snapshots:               Snapper
  Failed units:            

I am receiving errors such as-

nvme nvme0: failed to set APST feature (2)
                                                          ACPI Error: Aborting method   \_SB.PCI0.GPP2.PTXH.RHUB.POT4._PLD due to previous error (AE_AML_UNINITIALIZED_ELEMENT) (20220331/psparse-529)

 gkr-pam: unable to locate daemon control file


kvm: support for 'kvm_amd' disabled by bios

Thank you

Welcome.
This is way over my head, so I hope someone will chime in with something more relevant.
Meanwhile, take a look at this: Controller_failure_due_to_broken_APST_support.
Maybe either the kernel parameter or a BIOS update will do.

BIOS is fully updated to the newest, the problem existed in the old one too.

nvme get-feature /dev/nvme0 -f 0x0c -H gives

Autonomous Power State Transition Enable (APSTE): Disabled

I see there's also a recent bug report for Ubuntu with the error 2, but no useful info there (yet?).
Another shot in the dark: have you tried some other kernel, -lts maybe?

The gkr-pam seems to be harmless: [SOLVED] gkr-pam: unable to locate daemon control file / Newbie Corner / Arch Linux Forums

The kvm thing, is there a way to enable "virtualization technology" (or something like that) in the BIOS?
I think it's unrelated to the APST error though.

1 Like

Yes, i've tried different kernels, different distros too, Fedora, Nobara, everything what is Arch based does the same.The screen before the crash blinks for a second, turns black and gets back ''glitchy''.At least i can go to another tty and reboot, i couldn't do that on Nobara for instance.I can move the mouse though, but that's it.I don't know if it's NVME disc causing problems or GPU driver
timeout .

I would bet on the NVME, but I'm just guessing.
There's a chance the ACPI error has something to do with the main issue.
According to this (old and not so clear) post, maybe try iommu=disable kernel parameter.
Though, according to this other one, perhaps can be ignored.
It just rings a bell that the freezes do not happen while playing games.
By the way, did nvme_core.default_ps_max_latency_us=0 make any difference?
Probably not, given (APSTE): Disabled, but who knows.
Apologies, I give up and leave it to someone who does.

2 Likes

I am thinking it might be the issue too after doing a web search on the model of SSD. It seems to have potential issues on a multitude of different systems, even Playstation 5.
There is a firmware update utility for the SSD so that might be able to help it... but of course it is limited to being used with Windows https://www.xpg.com/us/compatibility/587?tab=downloads

2 Likes