Spontaneously Freezing up

System:
Kernel: 6.6.3-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc available: hpet,acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=b490211f-2fe3-40fa-bb09-ede6ca56b7ae rw rootflags=subvol=@
quiet quiet rd.udev.log_priority=3 vt.global_cursor_default=0 loglevel=3
ibt=off
Desktop: KDE Plasma v: 5.27.9 tk: Qt v: 5.15.11 wm: kwin_x11 vt: 2
dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Desktop Mobo: ASUSTeK model: ROG MAXIMUS X HERO (WI-FI AC) v: Rev 1.xx
serial: <superuser required> UEFI-[Legacy]: American Megatrends v: 2503
date: 09/25/2020
CPU:
Info: model: Intel Core i7-8700K bits: 64 type: MT MCP arch: Coffee Lake
gen: core 8 level: v3 note: check built: 2018 process: Intel 14nm family: 6
model-id: 0x9E (158) stepping: 0xA (10) microcode: 0xF4
Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 1.5 MiB desc: 6x256 KiB
L3: 12 MiB desc: 1x12 MiB
Speed (MHz): avg: 801 high: 817 min/max: 800/4700:5000 scaling:
driver: intel_pstate governor: powersave cores: 1: 800 2: 800 3: 800 4: 800
5: 817 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 801 bogomips: 88796
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
Vulnerabilities: <filter>
Graphics:
Device-1: NVIDIA GP104 [GeForce GTX 1080] vendor: ASUSTeK driver: nvidia
v: 545.29.06 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current
(as of 2023-10; EOL~2026-12-xx) arch: Pascal code: GP10x
process: TSMC 16nm built: 2016-2021 pcie: gen: 3 speed: 8 GT/s lanes: 16
ports: active: none off: DP-2,HDMI-A-1 empty: DP-1,DVI-D-1,HDMI-A-2
bus-ID: 01:00.0 chip-ID: 10de:1b80 class-ID: 0300
Device-2: Microsoft LifeCam Cinema driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-11:4
chip-ID: 045e:075d class-ID: 0102
Display: x11 server: X.Org v: 21.1.9 with: Xwayland v: 23.2.2
compositor: kwin_x11 driver: X: loaded: nvidia
unloaded: modesetting,nouveau,vesa alternate: fbdev,nv
gpu: nvidia,nvidia-nvswitch display-ID: :0 screens: 1
Screen-1: 0 s-res: 7680x2160 s-dpi: 159 s-size: 1227x352mm (48.31x13.86")
s-diag: 1276mm (50.26")
Monitor-1: DP-2 note: disabled pos: right model: Samsung U28E590
serial: <filter> built: 2018 res: 3840x2160 hz: 60 dpi: 161 gamma: 1.2
size: 607x345mm (23.9x13.58") diag: 698mm (27.5") ratio: 16:9 modes:
max: 3840x2160 min: 640x480
Monitor-2: HDMI-A-1 mapped: HDMI-0 note: disabled pos: primary,left
model: Samsung U28E570 serial: <filter> built: 2018 res: 3840x2160 hz: 30
dpi: 160 gamma: 1.2 size: 608x345mm (23.94x13.58") diag: 699mm (27.5")
ratio: 16:9 modes: max: 3840x2160 min: 640x480
API: EGL v: 1.5 hw: drv: nvidia platforms: device: 0 drv: nvidia device: 2
drv: swrast gbm: drv: nvidia surfaceless: drv: nvidia x11: drv: nvidia
inactive: wayland,device-1
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 545.29.06
glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce GTX 1080/PCIe/SSE2
memory: 7.81 GiB
API: Vulkan v: 1.3.269 layers: 14 device: 0 type: discrete-gpu
name: NVIDIA GeForce GTX 1080 driver: nvidia v: 545.29.06
device-ID: 10de:1b80 surfaces: xcb,xlib device: 1 type: cpu name: llvmpipe
(LLVM 16.0.6 256 bits) driver: mesa llvmpipe v: 23.2.1-arch1.2 (LLVM
16.0.6) device-ID: 10005:0000 surfaces: xcb,xlib
Audio:
Device-1: Intel 200 Series PCH HD Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel alternate: snd_soc_avs bus-ID: 00:1f.3
chip-ID: 8086:a2f0 class-ID: 0403
Device-2: NVIDIA GP104 High Definition Audio vendor: ASUSTeK
driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
bus-ID: 01:00.1 chip-ID: 10de:10f0 class-ID: 0403
Device-3: Microsoft LifeCam Cinema driver: snd-usb-audio,uvcvideo
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-11:4
chip-ID: 045e:075d class-ID: 0102
API: ALSA v: k6.6.3-zen1-1-zen status: kernel-api with: aoss
type: oss-emulator tools: N/A
Server-1: PipeWire v: 1.0.0 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel Ethernet I219-V vendor: ASUSTeK driver: e1000e v: kernel
port: N/A bus-ID: 00:1f.6 chip-ID: 8086:15b8 class-ID: 0200
IF: enp0s31f6 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter vendor: ASUSTeK
driver: rtw_8822be v: N/A modules: rtw88_8822be pcie: gen: 1 speed: 2.5 GT/s
lanes: 1 port: d000 bus-ID: 05:00.0 chip-ID: 10ec:b822 class-ID: 0280
IF: wlp5s0 state: down mac: <filter>
Bluetooth:
Device-1: ASUSTek Bluetooth Radio driver: btusb v: 0.8 type: USB rev: 1.1
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-13:5 chip-ID: 0b05:185c
class-ID: e001 serial: <filter>
Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 4.2
lmp-v: 8 status: discoverable: no pairing: no class-ID: 7c0104
Drives:
Local Storage: total: 5.46 TiB used: 437.71 GiB (7.8%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 PRO with
Heatsink 1TB size: 931.51 GiB block-size: physical: 512 B logical: 512 B
speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter> fw-rev: 5B2QGXA7
temp: 31.9 C scheme: MBR
ID-2: /dev/sda maj-min: 8:0 vendor: Western Digital
model: WD5000AAKS-00V1A0 size: 465.76 GiB block-size: physical: 512 B
logical: 512 B speed: 3.0 Gb/s tech: N/A serial: <filter> fw-rev: 1D05
ID-3: /dev/sdb maj-min: 8:16 vendor: Western Digital
model: WD4004FZWX-00GBGB0 size: 3.64 TiB block-size: physical: 4096 B
logical: 512 B speed: 6.0 Gb/s tech: HDD rpm: 7200 serial: <filter>
fw-rev: 0A81 scheme: GPT
ID-4: /dev/sdc maj-min: 8:32 vendor: Samsung model: SSD 850 EVO 500GB
size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 3B6Q scheme: GPT
Partition:
ID-1: / raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 119.34 GiB (12.8%) fs: btrfs dev: /dev/nvme0n1p1 maj-min: 259:1
ID-2: /home raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 119.34 GiB (12.8%) fs: btrfs dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /var/log raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 119.34 GiB (12.8%) fs: btrfs dev: /dev/nvme0n1p1 maj-min: 259:1
ID-4: /var/tmp raw-size: 931.51 GiB size: 931.51 GiB (100.00%)
used: 119.34 GiB (12.8%) fs: btrfs dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 31.27 GiB used: 0 KiB (0.0%) priority: 100
comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 12 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 36.0 C mobo: N/A gpu: nvidia temp: 55 C
Fan Speeds (rpm): N/A gpu: nvidia fan: 24%
Info:
Processes: 376 Uptime: 3m wakeups: 0 Memory: total: 32 GiB
available: 31.27 GiB used: 4.83 GiB (15.4%) Init: systemd v: 254
default: graphical tool: systemctl Compilers: gcc: 13.2.1 Packages:
pm: pacman pkgs: 1971 libs: 571 tools: octopi,pamac,paru,yay pm: flatpak
pkgs: 0 Shell: fish v: 3.6.1 running-in: yakuake inxi: 3.3.31
Garuda (2.6.19-2):
System install date:     2023-07-07
Last full system update: 2023-12-02
Is partially upgraded:   No
Relevant software:       timeshift(custom) NetworkManager dracut nvidia-dkms
Windows dual boot:       <superuser required>
Failed units:

Over the last few days, I’ve had my system just freeze.
Not sure if it’s hardware or software and before I do a reinstall, is there somewhere I can find a log that might show the issue at the time it freezes after I get rebooted?

3 Likes

I did try the Zen and LTS kernels over the last week and had the freezing on both kernels, so it’s not the kernel.
Bios is updated and slightly OC’d but has been for months

Looks like a reinstall coming. Besides the Freezeups, suddenly, one of my drives won’t mount;

(An error occurred while accessing ‘D: Drive’, the system responded: The requested operation has failed: Error mounting /dev/sdb2 at /run/media/hipkat/D: Drive: wrong fs type, bad option, bad superblock on /dev/sdb2, missing codepage or helper program, or other error).
Works fine when I switch to the Windows Drive.

Anytime I come out of sleep, the monitors don’t come on and I have to hard reboot the computer.

I know I need to renew my thermal paste. HWInfo on Windows shows CPU temps peaking @ 68° C but averaging about 38°

I’ll wait and see if you have ideas but I’m not a hardcore Linux programmer - although I’m decent with understanding many things.

I have experienced freezes in the past that were the result of a bad hard drive. This occurred even though smartmon testing said it was fine. You definitely, want to run some integrity checks on your drive. The drive may not be experiencing a hardware failure, it may simply have become corrupted. If this is a Windows drive/partition then it is best to try to correct the errors using the Windows specific tools. If there is also a Linux filesystem on another partition, then you should consult the Arch Wiki for the procedure to correct any Linux filesystem errors. If your partition table has been corrupted then you will need to use a Linux or Windows file recovery program to hopefully recover the drive geometry and allow the drive to become readable again.

Data recovery is really out of the scope of the Garuda forum, but you could try using the freeware Linux program testdisk as a recovery option. Windows recovery software options are usually expensive, so be prepared to shell out a fair bit of money if that is your only option.

You can check your journalctl logs, but it is rare that a complete freeze (KB, cursor & terminal are dead) will show the the actual cause of the freeze. That is why these types of issues can be so hard to diagnose. If the computer is completely locked up all journal entries stop when the freeze occurs.

For troubleshooting purposes you should really revert your system to the factory recommended clocking, (undo any custom overclocking).

3 Likes

The drive that’s not mounting is a storage Drive. I checked it with Windows and found no errors.
The Garuda install is on a M.2 Drive. I have Win 11 on a SATA SSD. I also have another HDD which is formatted and a 4TB EHD. Chkdsk detects no errors in any drives.

I just think with all the update issues over the last few months, something has been corrupted but in the file system,

I backed up the important stuff in my Home folder so a reformat and reinstall shouldn’t take more than about an hour or so.

The bummer is that I’ve been running this build for about 2 years now :frowning:

I’m going to take your advice and load the default settings on the Bios on next boot and see how that goes

When I check journalctl --system I fo find some random errors. but I’m not sure what they mean.
Here’s a few

For example

Dec 03 20:14:54 greg-garuda kernel: nvidia: loading out-of-tree module taints kernel.
Dec 03 20:14:54 greg-garuda kernel: nvidia: module license 'NVIDIA' taints kernel.
Dec 03 20:14:54 greg-garuda kernel: Disabling lock debugging due to kernel taint
Dec 03 20:14:54 greg-garuda kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Dec 03 20:14:54 greg-garuda kernel: nvidia: module license taints kernel.
Dec 03 20:14:54 greg-garuda kernel: pcieport 0000:00:1c.6: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Dec 03 20:14:54 greg-garuda kernel: pcieport 0000:00:1c.6:   device [8086:a296] error status/mask=00000001/00002000
Dec 03 20:14:54 greg-garuda kernel: pcieport 0000:00:1c.6:    [ 0] RxErr                  (First)
Dec 03 20:14:54 greg-garuda kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags >
Dec 03 20:14:54 greg-garuda kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags >
1 Like

The Nvidia kernel taint errors can be ignored. They simply reference the fact that using non free Nvidia drivers makes you some sort of tainted leper. :rofl:

The pciport errors look like they can be rectified with this method from the Arch forum:

https://bbs.archlinux.org/viewtopic.php?id=241473

That solution was easily located by searching the pciport error message from your log.

Same deal below, easily located by searching the error message:

https://bbs.archlinux.org/viewtopic.php?id=272497

1 Like

Thanks TBG.
I haven’t even researched them yet, as it was getting late here but neither of those seem to be a big deal. TPM is only enabled for the Win11 Drive. It doesn’t seem like the USB 3.1 error eans much either.

Next time it freezes, on reboot I can pull the journal and see what the error was right before it froze

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.