Dumped into Emergency Shell After Update: Failed to Start Switch Root & BTRFS Errors

For reference, other threads I've checked:
Garuda crashing on boot to emergency shell after update - solved by switching to LTS kernel, not applicable in this case

BTRFS error : system boot in emergency mode - hardware problem? Snapshots also didn't work for this person

[Solved - kinda] Can't log in : BTRFS: error in btrfs_run_delayed_refs:2124: errno:-5 IO failure - ran out of space (?), new SSD in order, and reinstalled. However, my SSD still has 194.1GB free. It's 5 years old though :thinking:

Also had a look at the Arch bug tracker but couldn't seem to find anything of relevance D:


Please excuse me if some of the logs have typos. I typed most of them by hand because I can't copy them onto my thumb drive or even the main SSD itself :sob:
This post is also a bit all over the place because there's just so much info, but I have no idea what's actually relevant. Sorry if a lot of them end up being red herrings (misleading)!

This update has really screwed up my system, and booting different kernels doesn't solve the issue. Thankfully snapshots still work. Strangely though, there weren't any weird errors that appeared during the update.
The Zen fallback initramfs makes it the furthest in the boot process, but then it leaves me with a mouse cursor on a black screen, and Getty can't start on other tty (not sure how normal this is).

What happens during the boot process can sometimes be a bit inconsistent.
Sometimes on the Zen kernel, a start job for Mountpoints Configured in the Real Root will start. This looks like:

[ *** ] A start job is running for Mountpoints Configured in the Real Root (31s / no limit)
[   32.361849] BTRFS : error (device nvme0n1p2: state A) in __btrfs_free_extent: 3053: errno=5 IO failure
[FAILED] Failed to start Mountpoints Configured in the Real Root.
See 'systemctl status initrd-parse-etc.service' for details.

Looking at initrd-parse-etc.service shows:

Aug 21 05:08:51 jas-xps159560 systemd[1]: initrd-parse-etc.service - Mountpoints Configured in the Real Root...
Aug 21 05:08:51 jas-xps159560 systemd-sysroot-fstab-check[359]: Failed to open /sysroot/etc/fstab: Input/Output error
Aug 21 05:08:51 jas-xps159560 systemd[1]: initrd-parse-etc.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 05:08:51 jas-xps159560 systemd[1]: initrd-parse-etc.service: Failed with result 'exit-code'.
Aug 21 05:08:51 jas-xps159560 systemd[1]: Failed to start Mountpoints Configured in the Real Root.
Aug 21 05:08:51 jas-xps159560 systemd[1]: initrd-parse-etc.service: Triggering OnFailiure= dependencies.

During the boot process, /sysroot mounts successfully but everything seems to start going wrong at Starting Plymouth switch root service.
On some boots (particularly on LTS kernel it seems?), it will say
Warning: /dev/disk/by-uuid/e34775a9-3a44-4cad-b4a1-7a4688cda7b4 does not exist. When this happens, my SSD can't be located on /dev :')
I also can't seem to mount my thumb drive either - it always appears as /dev/sda during normal operation, but in the emergency shell, it never seems to appear when I do ls /dev. :thinking:

The emergency shell tells me See "systemctl status initrd-switch-root.service" for details. Here's the output of that (typed by hand on my phone so if there's any errors please feel free to ask about them):

initrd-switch-root.service - Switch Root
    Loaded: loaded (/usr/lib/systemd/system/initrd-switch-root.service: static)
    Active: failed (Result: exit-code) since Mon 2023-08-21 03:51:34 UTC: 8min ago
    Process: 489 ExecStart=systemctl --no-block switch-root (code=exited, status=1/FAILURE)
    Main PID: 489 (code=exited, status=1/FAILURE)
    CPU: 7ms

Aug 21 03:51:34 jas-xps159560 systemd[1]: Starting Switch Root...
Aug 21 03:51:34 jas-xps159560 @ystemctl[489]: Failed to switch root: Failed to determine whether root path '/sysroot' contains an OS tree: Input/Output error
Aug 21 03:51:34 jas-xps159560 systemd[1]: initrd-switch-root.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 03:51:34 jas-xps159560 systemd[1]: initrd-switch-root. service: Failed with result 'exit-code'.
Aug 21 03:51:34 jas-xps159560 systemd[1]: Failed to start Switch Root.
Aug 21 03:51:34 jas-xps159560 systemd[1]: initrd-switch-root.service: Triggering OnFailure= dependencies.

The irritating thing is that when I chroot into the broken system, there's no journals of the previous boot(s) kept (journalctl -b -1) and other numbers.
The emergency shell also gave me a "/run/initramfs/rdsosreport.txt" file, but I've been having trouble with mounting the SSD (nvme0n1p2) to save it to /boot. Here is the output of dmesg when I try to do so:

[   34.890678] nvme nvme0: controller is down; will reset: CSTS=0xffffffff. PCI_STATUS=0xffff
[   34.890691] nvme nvme0: Does your device have a faulty power saving mode enabled?
[   34.890696] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[   34.910751] nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   34.911141] nvme nvme0: Disabling device after reset failure: -19
[   34.016699] nvme0n1: detected capacity change from 1000215216 to 0
[   34.916747] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd1. flush 0, corrupt 0, gen 0
[   34.916867] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd2. flush 0, corrupt 0, gen 0
[   34.917030] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd3. flush 0, corrupt 0, gen 0
[   34.917123] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd4. flush 0, corrupt 0, gen 0
[   34.917221] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd5. flush 0, corrupt 0, gen 0
[   34.917306] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd6. flush 0, corrupt 0, gen 0
[   34.917376] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd7. flush 0, corrupt 0, gen 0
[   34.917401] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd8. flush 0, corrupt 0, gen 0
[   34.917454] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd9. flush 0, corrupt 0, gen 0
[   34.917478] BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr0, rd10. flush 0, corrupt 0, gen 0
[   111.212313] Buffer I/O error on dev nvme0n1p2, logical block 0, async page read

rdsosreport.txt says much of the same thing from what I've read. If you know how to get this off the system so that it can be posted here, please do let me know :slight_smile: fyi. I did try to send it to termbin with nc termbin.com 9999 but that doesn't work in the emergency shell. (nc doesn't exist). Because /run is not persistent, rdsosreport.txt disappears after a reboot.
lsblk also doesn't work.


Other useful tidbits:
Output of smartctl -a: Garuda's PrivateBin

/etc/fstab from chroot:

# 'blkid' to print the universally unique identifier for a device; this may
# be used with UUID= as a more robust way to name devices that works even if
# disks are added and removed. See fstab(5).
#
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=9DC6-0B0F                            /boot/efi      vfat    defaults,noatime 0 2
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /              btrfs   subvol=/@,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /home          btrfs   subvol=/@home,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /root          btrfs   subvol=/@root,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /srv           btrfs   subvol=/@srv,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /var/cache     btrfs   subvol=/@cache,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /var/log       btrfs   subvol=/@log,defaults,noatime,compress=zstd 0 0
UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4 /var/tmp       btrfs   subvol=/@tmp,defaults,noatime,compress=zstd 0 0
UUID=fca2f0b8-fe12-4de7-b5ff-de54fd42afbb swap           swap    defaults   0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0

garuda-inxi from chroot:

Arch x86_64 12bits 64 12compiler gcc 12v 13.1.1 12clocksource tsc
    12available hpet,acpi_pm 12parameters BOOT_IMAGE=/boot/vmlinuz-x86_64 lang=en_US keytable=us tz=UTC
    misobasedir=garuda root=miso:LABEL=GARUDA_HYPRLAND_RAPTOR quiet systemd.show_status=1 ibt=off
    driver=nonfree nouveau.modeset=0 i915.modeset=1 radeon.modeset=1
  12Console N/A 12Distro Garuda Linux 12base Arch Linux
12Machine:
  12Type Laptop 12System Dell 12product XPS 15 9560 12v N/A 12serial <filter> 12Chassis 12type 10 12serial <filter>
  12Mobo Dell 12model 05FFDN 12v A00 12serial <filter> 12UEFI Dell 12v 1.24.0 12date 08/10/2021
12Battery:
  12ID-1 BAT0 12charge 64.3 Wh (77.7%) 12condition 82.8/97.0 Wh (85.4%) 12volts 12.0 12min 11.4
    12model LGC-LGC8.33 DELL 5XJ28 12type Li-ion 12serial <filter> 12status not charging
12CPU:
  12Info 12model Intel Core i7-7700HQ 12socket U3E1 12bits 64 12type MT MCP 12arch Kaby Lake 12gen core 7
    12level v3 12note check 12built 2018 12process Intel 14nm 12family 6 12model-id 0x9E (158) 12stepping 9
    12microcode 0xEA
  12Topology 12cpus 1x 12cores 4 12tpc 2 12threads 8 12smt enabled 12cache 12L1 256 KiB
    12desc d-4x32 KiB; i-4x32 KiB 12L2 1024 KiB 12desc 4x256 KiB 12L3 6 MiB 12desc 1x6 MiB
  12Speed (MHz) 12avg 2300 12high 2800 12min/max 800/3800 12base/boost 2700/2800 12scaling
    12driver intel_pstate 12governor powersave 12volts 0.8 V 12ext-clock 100 MHz 12cores 121 2800 122 800 123 2800
    124 2800 125 800 126 2800 127 2800 128 2800 12bogomips 44798
  12Flags avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  12Vulnerabilities <filter>
12Graphics:
  12Device-1 Intel HD Graphics 630 12vendor Dell 12driver i915 12v kernel 12arch Gen-9.5 12process Intel 14nm
    12built 2016-20 12ports 12active eDP-1 12empty DP-1, DP-2, HDMI-A-1, HDMI-A-2 12bus-ID 00:02.0
    12chip-ID 8086:591b 12class-ID 0300
  12Device-2 NVIDIA GP107M [GeForce GTX 1050 Mobile] 12vendor Dell 12driver N/A 12non-free 535.xx+
    12status current (as of 2023-08) 12arch Pascal 12code GP10x 12process TSMC 16nm 12built 2016-21 12pcie 12gen 3
    12speed 8 GT/s 12lanes 16 12bus-ID 01:00.0 12chip-ID 10de:1c8d 12class-ID 0302
  12Device-3 Sunplus Innovation Integrated_Webcam_HD 12driver uvcvideo 12type USB 12rev 2.0
    12speed 480 Mb/s 12lanes 1 12mode 2.0 12bus-ID 1-12:6 12chip-ID 1bcf:2b95 12class-ID 0e02
  12Display 12server X.org 12v 1.21.1.8 12driver 12gpu i915 12display-ID :1
  12Monitor-1 eDP-1 12model Sharp 0x1476 12built 2016 12res 3840x2160 12dpi 282 12gamma 1.2
    12size 346x194mm (13.62x7.64") 12diag 397mm (15.6") 12ratio 16:9 12modes 3840x2160
  12API OpenGL 12Message GL data unavailable for root.
12Audio:
  12Device-1 Intel CM238 HD Audio 12vendor Dell 12driver snd_hda_intel 12v kernel 12bus-ID 00:1f.3
    12chip-ID 8086:a171 12class-ID 0403
  12API ALSA 12v k6.4.1-zen2-1-zen 12status kernel-api 12tools alsactl,alsamixer,amixer
  12Server-1 sndiod 12v N/A 12status off 12tools aucat,midicat,sndioctl
  12Server-2 PipeWire 12v 0.3.77 12status off 12with 121 pipewire-pulse 12status off 122 wireplumber 12status off
    123 pipewire-alsa 12type plugin 124 pw-jack 12type plugin 12tools pactl,pw-cat,pw-cli,wpctl
12Network:
  12Device-1 Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter 12vendor Rivet Networks Killer
    Wireless-n/a/ac 1535 12driver ath10k_pci 12v kernel 12pcie 12gen 1 12speed 2.5 GT/s 12lanes 1
    12bus-ID 02:00.0 12chip-ID 168c:003e 12class-ID 0280 12temp 38.0 C
  12IF wlp2s0 12state up 12mac <filter>
12Bluetooth:
  12Device-1 Qualcomm Atheros QCA61x4 Bluetooth 4.0 12driver btusb 12v 0.8 12type USB 12rev 2.0 12speed 12 Mb/s
    12lanes 1 12mode 1.1 12bus-ID 1-4:3 12chip-ID 0cf3:e300 12class-ID e001
  12Report btmgmt 12ID hci0 12rfk-id 0 12state down 12bt-service N/A 12rfk-block 12hardware no 12software no
    12address N/A
12Drives:
  12Local Storage 12total 591.5 GiB 12used 264.19 GiB (44.7%)
  12ID-1 /dev/nvme0n1 12maj-min 259:0 12vendor Toshiba 12model KXG50ZNV512G NVMe 512GB 12size 476.94 GiB
    12block-size 12physical 512 B 12logical 512 B 12speed 31.6 Gb/s 12lanes 4 12tech SSD 12serial <filter>
    12fw-rev AADA4106 12temp 31.9 C
  12SMART yes 12health PASSED 12on 2y 62d 9h 12cycles 4,792 12read-units 92,843,639 [47.5 TB]
    12written-units 85,453,366 [43.7 TB]
  12ID-2 /dev/sda 12maj-min 8:0 12vendor SanDisk 12model Ultra USB 3.0 12size 114.56 GiB 12block-size
    12physical 512 B 12logical 512 B 12type USB 12rev 3.0 12spd 5 Gb/s 12lanes 1 12mode 3.2 gen-1x1 12tech N/A
    12serial <filter> 12fw-rev 1.00
  12SMART Message Unknown USB bridge. Flash drive/Unsupported enclosure?
12Partition:
  12ID-1 / 12raw-size 459.62 GiB 12size 459.62 GiB (100.00%) 12used 264.19 GiB (57.5%) 12fs btrfs
    12block-size 4096 B 12dev /dev/nvme0n1p2 12maj-min 259:2
12Swap:
  12Kernel 12swappiness 133 (default 60) 12cache-pressure 100 (default) 12zswap no
  12ID-1 swap-1 12type zram 12size 15.47 GiB 12used 0 KiB (0.0%) 12priority 100 12comp zstd
    12avail lzo,lzo-rle,lz4,lz4hc,842 12max-streams 8 12dev /dev/zram0
12Sensors:
  12System Temperatures 12cpu 48.0 C 12pch 47.5 C 12mobo 46.0 C
  12Fan Speeds (rpm) 12cpu 0 12fan-2 0
12Info:
  12Processes 0 12Uptime 21m 12wakeups 7228 12Memory 12total 16 GiB 12available 15.47 GiB 12used 1.75 GiB (11.3%)
  12igpu 64 MiB 12Init systemd 12v 254 12default graphical 12tool systemctl 12Compilers 12gcc 13.2.1 12clang 15.0.7
  12Packages 1792 12pm pacman 12pkgs 1783 12libs 504 12tools paru 12pm flatpak 12pkgs 9 12Client Unknown Client:
  kworker/7:1-events 12inxi 3.3.29
Garuda (2.6.16-1):
  System install date:     2023-08-21
  Last full system update: 2023-08-21
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager connman dracut nvidia-dkms
  Windows dual boot:       No/Undetected
Running in chroot, ignoring command 'list-units'
  Failed units:            

garuda-inxi after restoring last working snapshot:

System:
  Kernel: 6.4.10-zen2-1-zen arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen root=UUID=e34775a9-3a44-4cad-b4a1-7a4688cda7b4
    rw rootflags=subvol=@ rd.udev.log_priority=3 vt.global_cursor_default=0
    resume=UUID=fca2f0b8-fe12-4de7-b5ff-de54fd42afbb loglevel=3 nvidia_drm.modeset=1 ibt=off
  Desktop: Qtile v: 0.22.1 wm: LG3D vt: 2 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Laptop System: Dell product: XPS 15 9560 v: N/A serial: <superuser required> Chassis:
    type: 10 serial: <superuser required>
  Mobo: Dell model: 05FFDN v: A00 serial: <superuser required> UEFI: Dell v: 1.24.0
    date: 08/10/2021
Battery:
  ID-1: BAT0 charge: 64.3 Wh (77.7%) condition: 82.8/97.0 Wh (85.4%) volts: 11.9 min: 11.4
    model: LGC-LGC8.33 DELL 5XJ28 type: Li-ion serial: <filter> status: not charging
CPU:
  Info: model: Intel Core i7-7700HQ bits: 64 type: MT MCP arch: Kaby Lake gen: core 7 level: v3
    note: check built: 2018 process: Intel 14nm family: 6 model-id: 0x9E (158) stepping: 9
    microcode: 0xF4
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache: L1: 256 KiB
    desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB L3: 6 MiB desc: 1x6 MiB
  Speed (MHz): avg: 2799 high: 2801 min/max: 800/3800 scaling: driver: intel_pstate
    governor: powersave cores: 1: 2799 2: 2800 3: 2800 4: 2799 5: 2799 6: 2801 7: 2800 8: 2800
    bogomips: 44798
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities: <filter>
Graphics:
  Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel arch: Gen-9.5
    process: Intel 14nm built: 2016-20 ports: active: eDP-1 empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2
    bus-ID: 00:02.0 chip-ID: 8086:591b class-ID: 0300
  Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] vendor: Dell driver: nvidia v: 535.98
    alternate: nouveau,nvidia_drm non-free: 535.xx+ status: current (as of 2023-07) arch: Pascal
    code: GP10x process: TSMC 16nm built: 2016-21 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max:
    gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:1c8d class-ID: 0302
  Device-3: Sunplus Innovation Integrated_Webcam_HD driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-12:6 chip-ID: 1bcf:2b95 class-ID: 0e02
  Display: x11 server: X.Org v: 21.1.8 compositor: Picom v: git-c4107 driver: X:
    loaded: modesetting,nvidia unloaded: nouveau alternate: fbdev,intel,nv,vesa dri: iris gpu: i915
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3840x2160 s-dpi: 185 s-size: 527x296mm (20.75x11.65") s-diag: 604mm (23.8")
  Monitor-1: eDP-1 model: Sharp 0x1476 built: 2016 res: 3840x2160 hz: 60 dpi: 282 gamma: 1.2
    size: 346x194mm (13.62x7.64") diag: 397mm (15.6") ratio: 16:9 modes: 3840x2160
  API: OpenGL v: 4.6 Mesa 23.1.5 renderer: Mesa Intel HD Graphics 630 (KBL GT2)
    direct-render: Yes
Audio:
  Device-1: Intel CM238 HD Audio vendor: Dell driver: snd_hda_intel v: kernel
    alternate: snd_soc_avs bus-ID: 00:1f.3 chip-ID: 8086:a171 class-ID: 0403
  API: ALSA v: k6.4.10-zen2-1-zen status: kernel-api tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: PipeWire v: 0.3.77 status: active with: 1: pipewire-pulse status: active
    2: wireplumber status: active 3: pipewire-alsa type: plugin 4: pw-jack type: plugin
    tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter vendor: Rivet Networks
    Killer Wireless-n/a/ac 1535 driver: ath10k_pci v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1
    bus-ID: 02:00.0 chip-ID: 168c:003e class-ID: 0280 temp: 44.0 C
  IF: wlp2s0 state: up mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Qualcomm Atheros QCA61x4 Bluetooth 4.0 driver: btusb v: 0.8 type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-4:3 chip-ID: 0cf3:e300 class-ID: e001
  Report: bt-adapter note: tool can't run ID: hci0 rfk-id: 0 state: down bt-service: disabled
    rfk-block: hardware: no software: no address: N/A
Drives:
  Local Storage: total: 476.94 GiB used: 700.21 GiB (146.8%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Toshiba model: KXG50ZNV512G NVMe 512GB
    size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD
    serial: <filter> fw-rev: AADA4106 temp: 34.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 459.62 GiB size: 459.62 GiB (100.00%) used: 264.21 GiB (57.5%) fs: btrfs
    dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) used: 588 KiB (0.2%) fs: vfat
    dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 459.62 GiB size: 459.62 GiB (100.00%) used: 264.21 GiB (57.5%) fs: btrfs
    dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-4: /var/log raw-size: 459.62 GiB size: 459.62 GiB (100.00%) used: 264.21 GiB (57.5%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-5: /var/tmp raw-size: 459.62 GiB size: 459.62 GiB (100.00%) used: 264.21 GiB (57.5%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: zram size: 15.47 GiB used: 0 KiB (0.0%) priority: 100 dev: /dev/zram0
  ID-2: swap-2 type: partition size: 17.02 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/nvme0n1p3 maj-min: 259:3
Sensors:
  System Temperatures: cpu: 51.0 C pch: 58.5 C mobo: 44.0 C
  Fan Speeds (RPM): cpu: 2474 fan-2: 2488
Info:
  Processes: 291 Uptime: 24m wakeups: 677 Memory: total: 16 GiB available: 15.47 GiB
  used: 6.35 GiB (41.0%) Init: systemd v: 254 default: graphical tool: systemctl Compilers:
  gcc: 13.2.1 clang: 15.0.7 Packages: 1791 pm: pacman pkgs: 1782 libs: 504 tools: paru pm: flatpak
  pkgs: 9 Client: shell wrapper v: 5.1.16-release inxi: 3.3.28
Garuda (2.6.16-1):
  System install date:     2023-03-19
  Last full system update: 2023-08-18
  Is partially upgraded:   Yes
  Relevant software:       snapper NetworkManager dracut nvidia-dkms
  Windows dual boot:       Probably (Run as root to verify)
  Failed units:            power.service systemd-vconsole-setup.service 

So yes I'm stuck with a cursed system until I can update :upside_down_face:

Update log from the problematic update: Garuda's PrivateBin
The nintendo-dkms module stuff doesn't make a dent, although I plan to remove it in future updates since it's a deprecated module (which my smooth brain didn't realise at the time of installing it lol).

Any help, pointers or advice appreciated! This is the first time I've ever dealt with an issue like this so I'm not even sure where to begin looking. :')
I've done a bit of searching on this IO error and a lot of what seems to come up are SSDs going kaput, or NVIDIA problems of some sort :thinking:
idk maybe I should commit Windows user and reinstall my system lol

Update: found this
https://lore.kernel.org/lkml/[email protected]/T/

I’ll take a deeper look once I return home. For now, I hope it’s useful to anyone willing to help troubleshoot this problem :slight_smile:


Alright so I had a look and also found this, where the same person experiencing the issue says what drivers to blacklist: 217802 – regression NVME failure in 6.4.11 : 6.4.10 works fine.

Just FYI in case of interest to anyone.

I can confirm that blacklisting the drivers (rtsx_pci_and sdmmc and rtsx_pci) and rebuilding the initramfs - rebooting then works fine for both 6.4.11 and 6.5-rc6.

So this is a kernel regression issue that probably plagues certain Samsung SSDs, as it sounds like this person also has a Dell laptop of some sort.
I’m still puzzled how it affects my installation of the LTS kernel though. :thinking: Maybe I’ll wait for the next kernel version before I update again. (woohoo to the continuous running of my current cursed setup! :DDD )

Whaaa…?! :exploding_head:

Just a quick sanity check: have you booted to your BIOS menu to confirm the SATA controller is set to AHCI mode?

This looks like a suggestion. Have you tried applying these kernel parameters?

Hmm, this is very interesting. I wonder if this kernel bug could be a relevant clue for some of these other recent threads:

6 Likes

It was exactly what I thought when reading this thread. It would actually be a quite plausible explanation for the increased rate of this kind of issues.

6 Likes

those who are old enough to remember back in the to late 90's there were people running multiple HDD systems hooked up to a SCSI card plugged into a PCI slot for higher data throughput reasons that needed special drivers to install and run their OS and they always had issues because OS's were not designed to boot from a PCI slot SCSI expansion drive.

these new nvme SSD setups are the same thing on steroids, it's a 'stopgap tech' til they bring out the next-gen SATA-4 or 5 boards.
i do not recommend them as an OS install drive.

2 Likes

Just remembered… I thought about using OCR to get the text out of the terminal the next time something like this happened for like 2 seconds, right before it actually happened. :rofl: Next time to save my eyes and wrists, I’ll make sure to use OCR lol

Yep, double-checked :slight_smile:

Lmao I feel like an idiot… I just tried it out and it boots! :partying_face: to make sure it wasn’t just a one-off, I tried it again and can confirm that it is the solution in this case.

At least it made for a great discussion topic :smiley:
I wonder if adding these kernel params would help solve the other cases.

This is very interesting - I am using a 5 year old laptop (well technically the model is 6 years old). I’m not sure if this type of setup is still all the rage among laptop manufacturers these days but wouldn’t be surprised if it still is. :joy:
I’ll keep this in mind the next time I look around for hardware :smiley:

4 Likes

After seeing the following thread get resolved by the same solution marked here:

I wonder if filing a bug report as suggested in the original dmesg is a good idea. Maybe they wouldn’t be so welcoming of people using Arch derivatives though… :thinking:

Double-checking the links I sent in the second post in this topic, the OP has a very slightly different issue where the kernel params suggested by the dmesg didn’t work. In addition, the same person posted on the Arch forums about the piece of hardware that causes this:
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
https://bbs.archlinux.org/viewtopic.php?id=288095
^ Also contains details about blacklisting the drivers if you’re interested and the kernel params didn’t work for you.

My laptop has this (doesn’t show in garuda-inxi but does show in virt-manager) and that might be why I was dumped into an emergency shell as opposed to the other threads here where people get a hard freeze on loading initial ramdisk.

4 Likes

I guess you are talking about this one:
[ 34.890696] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
I think reporting certainly is a good idea, maybe someone else has already done it though :thinking:

Pretty interesting correlation you found! :eyes:

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.