Issues with BTRFS - unexpected unmounting of ROOT

Hi Garuda community,

I'm opening this weeb-post b/c I couldn't find anything about my issue on the Garuda forum nor in the net in general. (except some weird RAID-Server issues which misses the topic.)

I've got into some weird problems with my Garuda. It crashes (very hard - without kernel panic) unexpecting. It can happen during (writing this post with my Garuda) Desktop light tasks or gaming. There is just one game (Satisfactory) where it happens for sure. So I could reproduce it.

Hence I did it as often as I could to gasp the POST messages during the crash. And it notes: UUID can't be mounted. Which points to my only used SDD and frankly it holds the whole system. Why on earth would my system unmount my used file system? I never had that issue.
Additional issues (and with estimated rates):

  • 3 or 10 cases: SSD with Garuda doesn't appear in BIOS Boot menu (fixed by restarting)
  • 1 / hr: crashes unexpected like described above
  • every time: crashes like described above while playing Satisfactory (Lutris + Steam)

While having no issue with hard workloads e.g. DaVinci Resolve Studio (fully loaded GPU and CPU a ton of caches files (200GB caching is common like full IO/Speeds)).

My measures:

  • I did a forced btrfs-check - no errors found.
  • I did a full scrub - no errors found.
  • I did balancing - no errors/issues found.
  • I did journalctl -b -1 -p3
    As you can see here:

Jounalctl

(just showed me my btrfs-assistant qt5 issue but I don't mind it, since I'm using the tty anyways)

Mär 30 23:36:30 Clockwork kernel: sp5100-tco sp5100-tco: Watchdog hardware is disabled
Mär 30 23:36:31 Clockwork systemd-coredump[593]: [🡕] Process 586 (plymouth) of user 0 dumped core.

Module linux-vdso.so.1 with build-id 48d1f2a0ba6b9bf33e371e464222a5d7f3ba1da9
Module libgcc_s.so.1 with build-id 5d817452a709ca3a213341555ddcf446ecee37fa
Module ld-linux-x86-64.so.2 with build-id c09c6f50f6bcec73c64a0b4be77eadb8f7202410
Module libm.so.6 with build-id 596b63a006a4386dcab30912d2b54a7a61827b07
Module libudev.so.1 with build-id 7dc938362569112855b6086de066cd6a18d1b978
Module libc.so.6 with build-id 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
Module libply.so.5 with build-id 7872fe68a815672f245767251ea58cc6d807e48a
Module plymouth with build-id 5d41380191d68f361d3307b01cd3592755df9b97
Stack trace of thread 586:
#0  0x00007f98e39bbfd4 ply_list_get_first_node (libply.so.5 + 0x9fd4)
#1  0x00007f98e39ba04b n/a (libply.so.5 + 0x804b)
#2  0x00007f98e39ba696 ply_command_parser_free (libply.so.5 + 0x8696)
#3  0x000055a4bcd628d1 n/a (plymouth + 0x28d1)
#4  0x00007f98e37d5310 __libc_start_call_main (libc.so.6 + 0x2d310)
#5  0x00007f98e37d53c1 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d3c1)
#6  0x000055a4bcd62fa5 n/a (plymouth + 0x2fa5)
ELF object binary architecture: AMD x86-64
Mär 30 23:36:32 Clockwork systemd-coredump[664]: [🡕] Process 661 (plymouth) of user 0 dumped core.

Module linux-vdso.so.1 with build-id 48d1f2a0ba6b9bf33e371e464222a5d7f3ba1da9
Module libgcc_s.so.1 with build-id 5d817452a709ca3a213341555ddcf446ecee37fa
Module ld-linux-x86-64.so.2 with build-id c09c6f50f6bcec73c64a0b4be77eadb8f7202410
Module libm.so.6 with build-id 596b63a006a4386dcab30912d2b54a7a61827b07
Module libudev.so.1 with build-id 7dc938362569112855b6086de066cd6a18d1b978
Module libc.so.6 with build-id 85766e9d8458b16e9c7ce6e07c712c02b8471dbc
Module libply.so.5 with build-id 7872fe68a815672f245767251ea58cc6d807e48a
Module plymouth with build-id 5d41380191d68f361d3307b01cd3592755df9b97
Stack trace of thread 661:
#0  0x00007ff65a0e80f4 ply_list_node_get_data (libply.so.5 + 0xa0f4)
#1  0x00007ff65a0e6029 n/a (libply.so.5 + 0x8029)
#2  0x00007ff65a0e6696 ply_command_parser_free (libply.so.5 + 0x8696)
#3  0x000055fe5c2768d1 n/a (plymouth + 0x28d1)
#4  0x00007ff659f01310 __libc_start_call_main (libc.so.6 + 0x2d310)
#5  0x00007ff659f013c1 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2d3c1)
#6  0x000055fe5c276fa5 n/a (plymouth + 0x2fa5)
ELF object binary architecture: AMD x86-64
Mär 30 23:37:04 Clockwork btrfs-assistant-bin[4820]: This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, xcb.
Mär 30 23:37:04 Clockwork kernel: ata9.00: exception Emask 0x10 SAct 0xffff00 SErr 0x4c0000 action 0x6 frozen
Mär 30 23:37:04 Clockwork kernel: ata9.00: irq_stat 0x08000000, interface fatal error
Mär 30 23:37:04 Clockwork kernel: ata9: SError: { CommWake 10B8B Handshk }
Mär 30 23:37:04 Clockwork kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Mär 30 23:37:04 Clockwork kernel: ata9.00: cmd 61/28:40:f8:38:5f/00:00:07:00:00/40 tag 8 ncq dma 20480 out
res 40/00:44:f8:38:5f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
Mär 30 23:37:04 Clockwork kernel: ata9.00: status: { DRDY }
Mär 30 23:37:04 Clockwork kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Mär 30 23:37:04 Clockwork kernel: ata9.00: cmd 61/08:48:38:39:5f/00:00:07:00:00/40 tag 9 ncq dma 4096 out
res 40/00:44:f8:38:5f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
Mär 30 23:37:04 Clockwork kernel: ata9.00: status: { DRDY }
Mär 30 23:37:04 Clockwork kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Mär 30 23:37:04 Clockwork kernel: ata9.00: cmd 61/28:50:48:39:5f/00:00:07:00:00/40 tag 10 ncq dma 20480 out
res 40/00:44:f8:38:5f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
Mär 30 23:37:04 Clockwork kernel: ata9.00: status: { DRDY }
Mär 30 23:37:04 Clockwork kernel: ata9.00: failed command: WRITE FPDMA QUEUED
Mär 30 23:37:04 Clockwork kernel: ata9.00: cmd 61/08:58:80:39:5f/00:00:07:00:00/40 tag 11 ncq dma 4096 out
res 40/00:44:f8:38:5f/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
Mär 30 23:37:04 Clockwork kernel: ata9.00: status: { DRDY }

BTRFS is completely new to me. I mostly used ext4 before. Maybe I'm a n00b here xD

I also would exclude GPU (driver) issues since I never had any glitch and as mentioned I can push it hard without issues (including games - except Satisfactory) with no issues.

My inxi:

System:
Kernel: 5.17.1-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 11.2.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
root=UUID=be3ca4bc-7720-459e-8370-bd92450c06b0 rw rootflags=subvol=@
quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
loglevel=3
Desktop: KDE Plasma v: 5.24.4 tk: Qt v: 5.15.3 info: latte-dock
wm: kwin_x11 vt: 1 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Desktop Mobo: Micro-Star model: MPG X570 GAMING PLUS (MS-7C37) v: 2.0
serial: <superuser required> UEFI: American Megatrends LLC. v: A.F0
date: 12/16/2021
CPU:
Info: model: AMD Ryzen 7 5800X bits: 64 type: MT MCP arch: Zen 3
family: 0x19 (25) model-id: 0x21 (33) stepping: 0 microcode: 0xA201016
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
L3: 32 MiB desc: 1x32 MiB
Speed (MHz): avg: 3674 high: 4505 min/max: 2200/4850 boost: enabled
scaling: driver: acpi-cpufreq governor: performance cores: 1: 3662 2: 3682
3: 3607 4: 3599 5: 3599 6: 3621 7: 3641 8: 3599 9: 3641 10: 3673 11: 3609
12: 3613 13: 3588 14: 3552 15: 4505 16: 3598 bogomips: 121600
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: spec_store_bypass
mitigation: Speculative Store Bypass disabled via prctl
Type: spectre_v1
mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
STIBP: always-on, RSB filling
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] vendor: ASUSTeK
driver: amdgpu v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16 ports:
active: DP-2,HDMI-A-1 empty: DP-1,DP-3 bus-ID: 2f:00.0 chip-ID: 1002:73bf
class-ID: 0300
Display: x11 server: X.Org v: 1.21.1.3 compositor: kwin_x11 driver: X:
loaded: amdgpu unloaded: modesetting,radeon alternate: fbdev,vesa
gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.00x11.22")
s-diag: 1055mm (41.54")
Monitor-1: DP-2 mapped: DisplayPort-1 pos: primary,left model: Asus VS248
serial: <filter> built: 2012 res: 1920x1080 hz: 60 dpi: 92 gamma: 1.2
size: 531x299mm (20.91x11.77") diag: 609mm (24") ratio: 16:9 modes:
max: 1920x1080 min: 720x400
Monitor-2: HDMI-A-1 mapped: HDMI-A-0 pos: primary,right model: Asus VS248
serial: <filter> built: 2012 res: 1920x1080 hz: 60 dpi: 92 gamma: 1.2
size: 531x299mm (20.91x11.77") diag: 609mm (24") ratio: 16:9 modes:
max: 1920x1080 min: 720x400
OpenGL: renderer: AMD Radeon RX 6800 XT (sienna_cichlid LLVM 13.0.1 DRM
3.44 5.17.1-zen1-1-zen)
v: 4.6 Mesa 22.0.0 direct render: Yes
Audio:
Device-1: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 2f:00.1 chip-ID: 1002:ab28 class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 31:00.4 chip-ID: 1022:1487 class-ID: 0403
Sound Server-1: ALSA v: k5.17.1-zen1-1-zen running: yes
Sound Server-2: PulseAudio v: 15.0 running: no
Sound Server-3: PipeWire v: 0.3.49 running: yes
Network:
Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
vendor: Micro-Star MSI X570-A PRO driver: r8169 v: kernel pcie: gen: 1
speed: 2.5 GT/s lanes: 1 port: d000 bus-ID: 27:00.0 chip-ID: 10ec:8168
class-ID: 0200
IF: enp39s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
Local Storage: total: 1.36 TiB used: 605.67 GiB (43.3%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Crucial model: CT1000P5SSD8
size: 931.51 GiB block-size: physical: 512 B logical: 512 B
speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: P4CR311
temp: 34.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Crucial model: CT500MX500SSD1
size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
type: SSD serial: <filter> rev: 023 scheme: GPT
Partition:
ID-1: / raw-size: 465.46 GiB size: 465.46 GiB (100.00%)
used: 86.35 GiB (18.6%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 576 KiB (0.2%) fs: vfat dev: /dev/sda1 maj-min: 8:1
ID-3: /home raw-size: 465.46 GiB size: 465.46 GiB (100.00%)
used: 86.35 GiB (18.6%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
ID-4: /var/log raw-size: 465.46 GiB size: 465.46 GiB (100.00%)
used: 86.35 GiB (18.6%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
ID-5: /var/tmp raw-size: 465.46 GiB size: 465.46 GiB (100.00%)
used: 86.35 GiB (18.6%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 62.78 GiB used: 2.5 MiB (0.0%)
priority: 100 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 48.5 C mobo: N/A gpu: amdgpu temp: 33.0 C
mem: 32.0 C
Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
Processes: 348 Uptime: 39m wakeups: 0 Memory: 62.78 GiB
used: 3.75 GiB (6.0%) Init: systemd v: 250 tool: systemctl Compilers:
gcc: 11.2.0 clang: 13.0.1 Packages: pacman: 1770 lib: 516 Shell: fish
v: 3.3.1 default: Bash v: 5.1.16 running-in: konsole inxi: 3.3.14
Garuda (2.5.6-2):
System install date:     2022-03-24
Last full system update: 2022-03-30
Is partially upgraded:   No
Relevant software:       NetworkManager
Windows dual boot:       Probably (Run as root to verify)
Snapshots:               Snapper
Failed units:            bluetooth-autoconnect.service

I'd be happy about any advice or hint.

Atb and thanks in advance!

Bruce :shark:

1 Like

I can't help you but I just want to say you have to be one of the best new users to this forum in terms of actually trying shit yourself and researching etc. Did enjoy reading your issue in detail. I hope someone more knowledgeable than me can help you and welcome to the community!

4 Likes

Hello Grimy1928 - thanks for your kind reply.

Don't you worry I got flamed with RTFM posts back in 2013 :rofl:

2 Likes

Outta curiosity, did this start happening straight after the installation or after some sort of update? Might be worth restoring an older snapshot if this only just started happening. Other than I don't have a bloody clue :wink:

1 Like

It was an issue since the first boot after installation. I hoped some pacman -Syu could fix it :pensive:

1 Like

Ah that sucks then. Could be hardware related like potentially bad ram? I assume this is the first time you had the problem though and that you have been on other Linux distributions before this as opposed to Windows? Only suggestion I can make though is to type update in terminal as this is a custom script to make updating easier and better than the default method in Arch.

1 Like

And just incase it's some weird config causing it, you could open Garuda assistant and try resetting some configs or even select the reinstall all packages option. Sounds like you got some experimenting to do before you narrow down the cause!

1 Like

My RAM is just one year old and runs without issues on other distros and Win10Pro.
I don't think there is any HW issue and I also deactivated any OC settings (like PBO in BIOS).
I'm currently dual-booting to Win10Pro and have no issues there. (even OCed).

Well it might not sound very technical but my belly tells me there is something minor which I've simply overlooked. You know one of those cases where you're facepalming after getting the idea or the reason :rofl:

1 Like

I thought about it. I'm not sure if it bricks my config. Especially b/c of DaVinci Resolve. :thinking:

But an interesting option. Thank you :slight_smile:

1 Like

Well if it's not a default installation with some custom configs then that could definitely be a potential cause of issues. But yeah maybe best to leave that until later as no-one likes messing with their configs all over again

1 Like

Did you try an a different kernel btw like LTS?

1 Like

Ahh, nothing special otoh. Just getting the AMDPRo drivers for unleashing all the OpenCL capabilities of the card. (re: DaVinci Resolve with ArchLinux) But I had the issue before that. Hence I doubt it would be the reason.
I'm using the dr460nized KDE the first Distro where I didn't had the feeling of customizing ngl xD

1 Like

yep - no difference.
And if I recall it correctly there was a kernel update one or two days ago. Which didn't made a difference as well.

1 Like

Eh...did you try maybe installing MacOS?:joy:

1 Like

Hell no! :rofl: :rofl: :rofl: :rofl:

1 Like

Anyway @BluishHumility is writing an essay it seems so I'm sure he'll be able to help better than me haha

1 Like

Yeah I just recognized that I could see ppl typing.
I'm curious ^^

But I also enjoy your company :smile:

1 Like

You would be the first :joy:

1 Like

I wouldn’t rule out hardware.

Did you build the machine yourself?

Are there any old or well-used components in the build?

It might be good to start working through TBG’s checklist to see if you can put together more troubleshooting info to comb through: Troubleshooting System Stutter, Lags, Freezes, and Hangs - #11 by tbg

Here is a bit more material to plough through at your leisure: Troubleshooting System Stutter, Lags, Freezes, and Hangs

:rofl: This is a relatively tame contribution from me I’m afraid; TBG wrote the essay.

6 Likes

Might be worth also considering what makes Garuda unique which could also cause issues for some in rare cases. Like could nohang potentially be misbehaving?

1 Like