Random Freeze or Reboot

Hi guys,

I don't know if this is the correct category for this, if not please move the topic :slight_smile:
I have random freezes sometimes and also reboots, I suspect Citrix for this because I also work with my computer. This happens no matter what distribution I use, in common they have my pc, citrix and kde mostly. I configured journalctl so I can look what happens right before the freeze or reboot.
I didn't have a freeze lately, but 30 mins ago my pc did a reboot while I was not in the room. I looked into the logs and found this just before the reboot:

 Feb 27 08:58:00 mike-PC citrix-wfica[32265]: CmdForSurface: failure in GfxSurface API

(approx. 10000 times)

At the new boot there were this (warning & error) logs:



    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 0: baa0000000000135
    Feb 27 08:58:50 mike-PC kernel: fbcon: Taking over console
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000004 IPID b000000000
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1677484725 SOCKET 0 APIC 1 microcode 8001138
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: Machine check events logged
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 0: baa0000000000135
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000008 IPID b000000000
    Feb 27 08:58:50 mike-PC kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1677484725 SOCKET 0 APIC b microcode 8001138

    Feb 27 08:58:50 mike-PC kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)

    Feb 27 08:58:50 mike-PC kernel: PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]

Can you help me with this? I don't really know if the problem is Citrix, but the freezes and reboots always happen when I'm working (I think)

Thanks in advance!

Edit: here my garuda-inxi:

System:
Kernel: 6.1.12-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 12.2.1
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=b8e628c2-5bd2-423b-9104-a2fbea09a834 rw [email protected]
quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
loglevel=3 ibt=off
Desktop: KDE Plasma v: 5.27.1 tk: Qt v: 5.15.8 wm: kwin_x11 vt: 1 dm: SDDM
Distro: Garuda Linux base: Arch Linux
Machine:
Type: Desktop Mobo: ASUSTeK model: PRIME B350-PLUS v: Rev X.0x
serial: <superuser required> UEFI: American Megatrends v: 6042
date: 04/28/2022
CPU:
Info: model: AMD Ryzen 5 1500X bits: 64 type: MT MCP arch: Zen level: v3
note: check built: 2017-19 process: GF 14nm family: 0x17 (23) model-id: 1
stepping: 1 microcode: 0x8001138
Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
L1: 384 KiB desc: d-4x32 KiB; i-4x64 KiB L2: 2 MiB desc: 4x512 KiB
L3: 16 MiB desc: 2x8 MiB
Speed (MHz): avg: 3240 high: 3595 min/max: 1550/3500 boost: enabled
scaling: driver: acpi-cpufreq governor: performance cores: 1: 3500 2: 3042
3: 3595 4: 2985 5: 2908 6: 3593 7: 3314 8: 2990 bogomips: 55893
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: NVIDIA TU106 [GeForce RTX 2060 SUPER] vendor: Micro-Star MSI
driver: nvidia v: 525.89.02 alternate: nouveau,nvidia_drm non-free: 525.xx+
status: current (as of 2023-02) arch: Turing code: TUxxx
process: TSMC 12nm FF built: 2018-22 pcie: gen: 3 speed: 8 GT/s lanes: 16
bus-ID: 07:00.0 chip-ID: 10de:1f06 class-ID: 0300
Device-2: Logitech HD Pro Webcam C920 type: USB
driver: snd-usb-audio,uvcvideo bus-ID: 1-2:2 chip-ID: 046d:082d
class-ID: 0102 serial: <filter>
Display: x11 server: X.Org v: 21.1.7 with: Xwayland v: 22.1.8
compositor: kwin_x11 driver: X: loaded: nvidia unloaded: modesetting
alternate: fbdev,nouveau,nv,vesa gpu: nvidia display-ID: :0 screens: 1
Screen-1: 0 s-res: 3840x1080 s-dpi: 101 s-size: 966x272mm (38.03x10.71")
s-diag: 1004mm (39.51")
Monitor-1: DP-5 pos: primary,left res: 1920x1080 hz: 60 dpi: 92
size: 531x299mm (20.91x11.77") diag: 609mm (23.99") modes: N/A
Monitor-2: HDMI-0 pos: right res: 1920x1080 hz: 60 dpi: 103
size: 475x267mm (18.7x10.51") diag: 545mm (21.45") modes: N/A
API: OpenGL v: 4.6.0 NVIDIA 525.89.02 renderer: NVIDIA GeForce RTX 2060
SUPER/PCIe/SSE2 direct-render: Yes
Audio:
Device-1: NVIDIA TU106 High Definition Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel bus-ID: 1-2:2 pcie: chip-ID: 046d:082d
class-ID: 0102 gen: 3 speed: 8 GT/s serial: <filter> lanes: 16
bus-ID: 07:00.1 chip-ID: 10de:10f9 class-ID: 0403
Device-2: AMD Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel
v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 09:00.3
chip-ID: 1022:1457 class-ID: 0403
Device-3: Logitech HD Pro Webcam C920 type: USB
driver: snd-usb-audio,uvcvideo
Device-4: C-Media Auna Mic CM900 type: USB
driver: hid-generic,snd-usb-audio,usbhid bus-ID: 5-4:2 chip-ID: 0d8c:0134
class-ID: 0300
Sound API: ALSA v: k6.1.12-zen1-1-zen running: yes
Sound Server-1: PulseAudio v: 16.1 running: no
Sound Server-2: PipeWire v: 0.3.66 running: yes
Network:
Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
vendor: ASUSTeK PRIME B450M-A driver: r8169 v: kernel pcie: gen: 1
speed: 2.5 GT/s lanes: 1 port: f000 bus-ID: 03:00.0 chip-ID: 10ec:8168
class-ID: 0200
IF: enp3s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives:
Local Storage: total: 2.96 TiB used: 291.65 GiB (9.6%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 850 EVO 250GB
size: 232.89 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 3B6Q scheme: GPT
ID-2: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST2000DM006-2DM164
size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
type: HDD rpm: 7200 serial: <filter> rev: CC26 scheme: GPT
ID-3: /dev/sdc maj-min: 8:32 vendor: Crucial model: CT1000MX500SSD1
size: 931.51 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 045 scheme: GPT
Partition:
ID-1: / raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 291.65 GiB (31.3%) fs: btrfs dev: /dev/sdc1 maj-min: 8:33
ID-2: /boot/efi raw-size: 512.7 MiB size: 511.7 MiB (99.80%)
used: 620 KiB (0.1%) fs: vfat dev: /dev/sda5 maj-min: 8:5
ID-3: /home raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 291.65 GiB (31.3%) fs: btrfs dev: /dev/sdc1 maj-min: 8:33
ID-4: /var/log raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 291.65 GiB (31.3%) fs: btrfs dev: /dev/sdc1 maj-min: 8:33
ID-5: /var/tmp raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 291.65 GiB (31.3%) fs: btrfs dev: /dev/sdc1 maj-min: 8:33
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 15.53 GiB used: 0 KiB (0.0%) priority: 100
dev: /dev/zram0
Sensors:
System Temperatures: cpu: 40.8 C mobo: N/A gpu: nvidia temp: 50 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 0%
Info:
Processes: 284 Uptime: 51m wakeups: 0 Memory: 15.53 GiB
used: 4.18 GiB (26.9%) Init: systemd v: 253 default: graphical
tool: systemctl Compilers: gcc: 12.2.1 Packages: pm: pacman pkgs: 1844
libs: 546 tools: octopi,paru Shell: fish v: 3.6.0 default: Bash v: 5.1.16
running-in: konsole inxi: 3.3.25
Garuda (2.6.15-1):
System install date:     2023-02-19
Last full system update: 2023-02-25
Is partially upgraded:   No
Relevant software:       snapper NetworkManager mkinitcpio nvidia-dkms
Windows dual boot:       Probably (Run as root to verify)
Failed units:

Missing your garuda-inxi , please edit your post.

3 Likes

Do you have the same problem with Windows?
Test with an open terminal window on which top/htop is running and the error occurs which process is using the CPU so much.

If the CPU gets too hot for too long, that's bad.
The CPU is from 2017 (?)
Are all fans still working?

1 Like

No I don't have this problem with Windows
I always have conky running showing my processes, there is nothing which highly uses the CPU
If I only have Citrix running like now, my CPU temperature is around 38 to 42 °C, fans are all working and yes the CPU is from 2017.

As I said, you must try to check when the error (freeze) occurs.

3 Likes

First generation Ryzens can have some C6 c-state issues. This could be your problem, as this can lead to hard system freezes (might even lead cases of freezes on shutdowns, reboots and suspends) where even using REISUB won't help because it is a low level hardware issue. Typically the system freeze comes after a low power state where you aren't using the CPU much. Typically, motherboards have settings to fix this, and the name of the setting could vary by motherboard manufacturer.

Arch Wiki entry with more information:
https://wiki.archlinux.org/title/Ryzen#Soft_lock_freezing
https://wiki.archlinux.org/title/Ryzen#Freeze_on_shutdown,_reboot_and_suspend

4 Likes

ok thanks I will try this and wait what happens

It just happened again, I just had brave open and balena-etcher to flash a usb drive.
This is the last picture of conky before the freeze

I think it's a hardware problem, I can't find anything relevant in the logs

I think @Kayo point it out. Not a heat problem I thought.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.