System goes blackscreen unsresponsive and requires hard reset

Hi all, I am having issues with my system,

The system crashes completely, goes blackscreen and I need to hard reset it.
The behaviour is consistent with me gaming. Within 10-30 minutes, the system just dies.
Sometimes it gives a failed to unmount error prompt after blackscreening but that is not
consistent.
Also Systemd freezing execution (2)
I tried both the standard and LTS kernel, doesnt make a difference, same behaviour,
I also tried both wayland and x11, same behaviour.
Aside gaming, the issue is not present at all works normally for all daily tasks.
So I tried recreating the issue with a second machine.
That one has the LTS iso installed and I let a game run over night, and the issue was not there.
both machines are AMD platform only difference is the other pc has an Nvidia gpu.
one thing of note, I am tryiong to use a Disk as a steamlibrary (ext4)
dont know if that has anything to do with it.

I also found a couple of potential solutions online, though none helped.
disabling compositor for example.
Others claim its due to swap:
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 61,9G 0B 100
Though this shouldnt be a problem?

Journalctl doesnt show the system crashing so I cant or dont know how to deduce what is
a relevant error message and what isnt.

Something of note is that there are plasma, QT, device not in /dev and a bunch of other errors
in ctl.

TTY doesnt help as well. same black screen.

I am not against a reinstall and already prepared the data, as I plan to move onto the
Nvme drive. I removed windows from the machine, as i found it somehow put a recovery partition
on every drive in the system. which did cause issues.

I would apreciate advice, as it seems im either doing something wrong or I broke the installation
somewhere along the way.

Journal CTL is too large to post here, If needed do let me know what to do to get it to you.
Its already outputted to my desktop.

EDIT: Found the pastebin on the site kinda blind here.

Kernel: 6.6.51-1-lts arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-lts
root=UUID=d57827c8-c39e-4654-9f19-faf4afa73494 rw rootflags=subvol=@
quiet quiet rd.udev.log_priority=3 vt.global_cursor_default=0 loglevel=3
Desktop: KDE Plasma v: 6.1.5 tk: Qt v: N/A info: frameworks v: 6.6.0
wm: kwin_wayland vt: 1 dm: SDDM Distro: Garuda base: Arch Linux
Machine:
Type: Desktop System: ASRock product: B650E Taichi Lite v: N/A
serial: <superuser required>
Mobo: ASRock model: B650E Taichi Lite serial: <superuser required>
uuid: <superuser required> UEFI: American Megatrends LLC. v: 2.08.AS01
date: 02/01/2024
Battery:
ID-1: hidpp_battery_0 charge: 53% condition: N/A volts: 3.8 min: N/A
model: Logitech G903 LIGHTSPEED Wireless Gaming Mouse w/ HERO type: N/A
serial: <filter> status: discharging
CPU:
Info: model: AMD Ryzen 9 7950X3D bits: 64 type: MT MCP arch: Zen 4 gen: 5
level: v4 note: check built: 2022+ process: TSMC n5 (5nm) family: 0x19 (25)
model-id: 0x61 (97) stepping: 2 microcode: 0xA601206
Topology: cpus: 1x cores: 16 tpc: 2 threads: 32 smt: enabled cache:
L1: 1024 KiB desc: d-16x32 KiB; i-16x32 KiB L2: 16 MiB desc: 16x1024 KiB
L3: 128 MiB desc: 1x32 MiB, 1x96 MiB
Speed (MHz): avg: 1132 high: 4750 min/max: 400/5759 scaling:
driver: amd-pstate-epp governor: powersave cores: 1: 400 2: 400 3: 3302
4: 400 5: 400 6: 400 7: 400 8: 400 9: 400 10: 3273 11: 400 12: 3966
13: 400 14: 3166 15: 3166 16: 400 17: 400 18: 4627 19: 400 20: 400 21: 400
22: 400 23: 400 24: 400 25: 400 26: 400 27: 400 28: 400 29: 400 30: 400
31: 400 32: 4750 bogomips: 268916
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities: <filter>
Graphics:
Device-1: AMD Navi 31 [Radeon RX 7900 XT/7900 XTX/7900M]
vendor: ASUSTeK TUF Gaming driver: amdgpu v: kernel arch: RDNA-3
code: Navi-3x process: TSMC n5 (5nm) built: 2022+ pcie: gen: 4
speed: 16 GT/s lanes: 16 ports: active: HDMI-A-1 empty: DP-1,DP-2,DP-3
bus-ID: 03:00.0 chip-ID: 1002:744c class-ID: 0300
Device-2: AMD Raphael driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s lanes: 16
ports: active: none empty: DP-4, DP-5, DP-6, HDMI-A-2 bus-ID: 4f:00.0
chip-ID: 1002:164e class-ID: 0300 temp: 50.0 C
Display: wayland server: X.org v: 1.21.1.13 with: Xwayland v: 24.1.2
compositor: kwin_wayland driver: X: loaded: amdgpu
unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi
gpu: amdgpu,amdgpu display-ID: 0
Monitor-1: HDMI-A-1 res: 3072x1728 size: N/A modes: N/A
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: radeonsi device: 2 drv: swrast gbm: drv: kms_swrast
surfaceless: drv: radeonsi wayland: drv: radeonsi x11: drv: radeonsi
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.2-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 7900 XTX (radeonsi
navi31 LLVM 18.1.8 DRM 3.54 6.6.51-1-lts) device-ID: 1002:744c
memory: 23.44 GiB unified: no display-ID: :1.0
API: Vulkan v: 1.3.295 layers: 7 device: 0 type: discrete-gpu name: AMD
Radeon RX 7900 XTX (RADV NAVI31) driver: mesa radv v: 24.2.2-arch1.1
device-ID: 1002:744c surfaces: xcb,xlib,wayland device: 1
type: integrated-gpu name: AMD Radeon Graphics (RADV RAPHAEL_MENDOCINO)
driver: mesa radv v: 24.2.2-arch1.1 device-ID: 1002:164e
surfaces: xcb,xlib,wayland device: 2 type: cpu name: llvmpipe (LLVM
18.1.8 256 bits) driver: mesa llvmpipe v: 24.2.2-arch1.1 (LLVM 18.1.8)
device-ID: 10005:0000 surfaces: xcb,xlib,wayland
Audio:
Device-1: AMD Navi 31 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 03:00.1 chip-ID: 1002:ab30
class-ID: 0403
Device-2: AMD Rembrandt Radeon High Definition Audio driver: snd_hda_intel
v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 4f:00.1
chip-ID: 1002:1640 class-ID: 0403
Device-3: AMD Family 17h/19h HD Audio driver: snd_hda_intel v: kernel
pcie: gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 4f:00.6 chip-ID: 1022:15e3
class-ID: 0403
Device-4: Generic USB Audio driver: hid-generic,snd-usb-audio,usbhid
type: USB rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-8:4
chip-ID: 26ce:0a06 class-ID: 0300
Device-5: Logitech PRO X Wireless Gaming Headset
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 5-1.4.2:4 chip-ID: 046d:0aba class-ID: 0300
API: ALSA v: k6.6.51-1-lts status: kernel-api tools: N/A
Server-1: PipeWire v: 1.2.3 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel Wi-Fi 6E AX210/AX1675 2x2 [Typhoon Peak]
vendor: Rivet Networks Killer driver: iwlwifi v: kernel pcie: gen: 2
speed: 5 GT/s lanes: 1 bus-ID: 49:00.0 chip-ID: 8086:2725 class-ID: 0280
IF: wlp73s0 state: down mac: <filter>
Device-2: Realtek Killer E3000 2.5GbE vendor: ASRock driver: r8169
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: a000 bus-ID: 4a:00.0
chip-ID: 10ec:3000 class-ID: 0200
IF: enp74s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
Device-1: Intel AX210 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-7:3 chip-ID: 8087:0032
class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 0 state: down bt-service: enabled,running
rfk-block: hardware: no software: no address: <filter> bt-v: 5.3 lmp-v: 12
status: discoverable: no pairing: no
Drives:
Local Storage: total: 10.9 TiB used: 1.65 TiB (15.1%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: HP model: SSD EX900 500GB
size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: S0614B0 temp: 46.9 C
scheme: GPT
ID-2: /dev/nvme1n1 maj-min: 259:1 vendor: Samsung model: SSD 980 1TB
size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 44.9 C
scheme: GPT
ID-3: /dev/sda maj-min: 8:0 vendor: TeamGroup model: T-FORCE T253TY004T
size: 3.73 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 6364 scheme: GPT
ID-4: /dev/sdb maj-min: 8:16 vendor: TeamGroup model: T2532TB
size: 1.86 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 18D8 scheme: GPT
ID-5: /dev/sdc maj-min: 8:32 vendor: TeamGroup model: T2534TB
size: 3.73 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 60.0 scheme: GPT
ID-6: /dev/sdd maj-min: 8:48 vendor: Kingston model: SUV400S37240G
size: 223.57 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 87RA scheme: GPT
Partition:
ID-1: / raw-size: 223.27 GiB size: 223.27 GiB (100.00%)
used: 30.17 GiB (13.5%) fs: btrfs dev: /dev/sdd2 maj-min: 8:50
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 26.3 MiB (8.8%) fs: vfat dev: /dev/sdd1 maj-min: 8:49
ID-3: /home raw-size: 223.27 GiB size: 223.27 GiB (100.00%)
used: 30.17 GiB (13.5%) fs: btrfs dev: /dev/sdd2 maj-min: 8:50
ID-4: /var/log raw-size: 223.27 GiB size: 223.27 GiB (100.00%)
used: 30.17 GiB (13.5%) fs: btrfs dev: /dev/sdd2 maj-min: 8:50
ID-5: /var/tmp raw-size: 223.27 GiB size: 223.27 GiB (100.00%)
used: 30.17 GiB (13.5%) fs: btrfs dev: /dev/sdd2 maj-min: 8:50
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
ID-1: swap-1 type: zram size: 61.92 GiB used: 0 KiB (0.0%) priority: 100
comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 32 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 55.6 C mobo: N/A
Fan Speeds (rpm): N/A
GPU: device: amdgpu temp: 50.0 C device: amdgpu temp: 58.0 C mem: 66.0 C
fan: 768 watts: 69.00
Info:
Memory: total: 64 GiB note: est. available: 61.92 GiB used: 6.38 GiB (10.3%)
Processes: 586 Power: uptime: 11m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 24.75 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 256 default: graphical
tool: systemctl
Packages: pm: pacman pkgs: 1445 libs: 427 tools: octopi,paru Compilers:
gcc: 14.2.1 Shell: garuda-inxi default: Bash v: 5.2.32 running-in: konsole
inxi: 3.3.35
Garuda (2.6.26-1):
System install date:     2024-06-25
Last full system update: 2024-09-17
Is partially upgraded:   No
Relevant software:       snapper NetworkManager dracut
Windows dual boot:       Probably (Run as root to verify)
Failed units:~~~

So I was using Sunshine server on the pc, and I turned it off, for about two hours now no issues.
will confirm if this was the cause.

EDIT: nope crashed again.

Have you tried switching Desktop Environments? I had freezing issues on KDE but when I switched to Xfce the freezing stopped.

That I didnt.
I recreated this issue again.
and it shows another issue. States Journal is aborted.
Ext4 something failed.
So im trying first to to run the games from Root drive. If this also does the same thing ill try switching.

It could give more problems.

So, KDE works fine, but games crash?

Seems like it.
it does that on all games. native or otherwise.

regarding the script , if hard reset causes issues, i believe damage is already done.
I presume a full system wipe is the best course of action now?

Im beggining to believe the AM5 platform does not like linux at all. never had these issues with the last am4 platform.

note, im currently running the game of the root drive with the ext4 drives unmounted.

I ask to be very sure :slight_smile: , I moved it from Issues & Assistance to Games & Emulators , since it is not a Garuda problem.


I need more coffee.

Crashed again.
This time though i lost the boot option from bios on that drive XD

Not in in my experience. Working from day one on an Asus B650E board (same chipset as yours), AMD 7800x3D, Radeon 7900XTX.

Please update the BIOS/UEFI on your board, it might help: ASRock > B650E Taichi Lite

Your last info might indicate issues with the board or with the drive.
Do you have any such issues on Windows?

2 Likes

Bios is up to date, should be at least.

So, i Nuked the whole system meanwhile, to start with a clean slate.
something i found is a lot of AHCI errors but they focus on 2 of 3 storage drives.
At some point after some time the drives simply dissapear from the system

Ahci errors included

EXT4-fs (sdb1): I/O error while writing superblock
Buffer I/O error on dev sdb1, logical block 249593856, lost sync page write
XT4-fs (sdb1): I/O error while writing superblock

One drive Remains though.

I thought perhaps my Sata controller went kaputt. but then none of the sata
drives would work. I believe.

Ill check for a bios update maybe something released recently

It still might be the case, if 2 out of 3 drives go bye-bye. You can buy/use an external USB drive enclosure to test, if the drivers are faulty or the SATA controller.

Hi, So updated bios, something did release meanwhile,
I Reformatted both the 1/3 and 2/3 disks,
I also reconnected the sata cables around on the motherboard.

both drives umount and continue to exhibit the same behaviour even after reformat/ recreate fs

though now, it remounts as read only according to dmesg

[  800.884194] ata4.00: exception Emask 0x52 SAct 0x378 SErr 0xffffffff action 0x6 frozen
[  800.884199] ata4: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
[  800.884202] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884203] ata4.00: cmd 61/40:18:00:49:40/05:00:4c:01:00/40 tag 3 ncq dma 688128 out
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884207] ata4.00: status: { DRDY }
[  800.884208] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884209] ata4.00: cmd 61/c0:20:40:4e:40/02:00:4c:01:00/40 tag 4 ncq dma 360448 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884212] ata4.00: status: { DRDY }
[  800.884212] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884213] ata4.00: cmd 61/40:28:00:51:40/05:00:4c:01:00/40 tag 5 ncq dma 688128 out
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884216] ata4.00: status: { DRDY }
[  800.884217] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884217] ata4.00: cmd 61/c0:30:40:56:40/02:00:4c:01:00/40 tag 6 ncq dma 360448 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884220] ata4.00: status: { DRDY }
[  800.884221] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884221] ata4.00: cmd 61/08:40:30:12:00/00:00:00:00:00/40 tag 8 ncq dma 4096 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884224] ata4.00: status: { DRDY }
[  800.884225] ata4.00: failed command: WRITE FPDMA QUEUED
[  800.884226] ata4.00: cmd 61/28:48:38:12:00/00:00:00:00:00/40 tag 9 ncq dma 20480 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
[  800.884228] ata4.00: status: { DRDY }
[  800.884230] ata4: hard resetting link
[  800.884238] ahci 0000:4a:00.0: AHCI controller unavailable!
[  801.924168] ata4: failed to resume link (SControl FFFFFFFF)
[  801.924187] ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[  806.996053] ata4: hard resetting link
[  806.996060] ahci 0000:4a:00.0: AHCI controller unavailable!
[  808.036035] ata4: failed to resume link (SControl FFFFFFFF)
[  808.036053] ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[  808.036062] ata4: limiting SATA link speed to <unknown>
[  813.139920] ata4: hard resetting link
[  813.139928] ahci 0000:4a:00.0: AHCI controller unavailable!
[  814.179904] ata4: failed to resume link (SControl FFFFFFFF)
[  814.179923] ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
[  814.179931] ata4.00: disable device
[  814.179962] ahci 0000:4a:00.0: AHCI controller unavailable!
[  814.179972] sd 3:0:0:0: [sdc] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=49s
[  814.179976] sd 3:0:0:0: [sdc] tag#3 Sense Key : Not Ready [current]
[  814.179978] sd 3:0:0:0: [sdc] tag#3 Add. Sense: Logical unit not ready, hard reset required
[  814.179980] sd 3:0:0:0: [sdc] tag#3 CDB: Write(16) 8a 00 00 00 00 01 4c 40 49 00 00 00 05 40 00 00
[  814.179982] I/O error, dev sdc, sector 5574248704 op 0x1:(WRITE) flags 0x4000 phys_seg 168 prio class 2
[  814.179995] sd 3:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=49s
[  814.179997] sd 3:0:0:0: [sdc] tag#4 Sense Key : Not Ready [current]
[  814.179999] sd 3:0:0:0: [sdc] tag#4 Add. Sense: Logical unit not ready, hard reset required
[  814.180000] sd 3:0:0:0: [sdc] tag#4 CDB: Write(16) 8a 00 00 00 00 01 4c 40 4e 40 00 00 02 c0 00 00
[  814.180001] I/O error, dev sdc, sector 5574250048 op 0x1:(WRITE) flags 0x0 phys_seg 88 prio class 2
[  814.180007] sd 3:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=49s
[  814.180009] sd 3:0:0:0: [sdc] tag#5 Sense Key : Not Ready [current]
[  814.180011] sd 3:0:0:0: [sdc] tag#5 Add. Sense: Logical unit not ready, hard reset required
[  814.180012] sd 3:0:0:0: [sdc] tag#5 CDB: Write(16) 8a 00 00 00 00 01 4c 40 51 00 00 00 05 40 00 00
[  814.180013] I/O error, dev sdc, sector 5574250752 op 0x1:(WRITE) flags 0x4800 phys_seg 168 prio class 2
[  814.180018] sd 3:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=49s
[  814.180020] sd 3:0:0:0: [sdc] tag#6 Sense Key : Not Ready [current]
[  814.180021] sd 3:0:0:0: [sdc] tag#6 Add. Sense: Logical unit not ready, hard reset required
[  814.180023] sd 3:0:0:0: [sdc] tag#6 CDB: Write(16) 8a 00 00 00 00 01 4c 40 56 40 00 00 02 c0 00 00
[  814.180024] I/O error, dev sdc, sector 5574252096 op 0x1:(WRITE) flags 0x800 phys_seg 88 prio class 2
[  814.180030] sd 3:0:0:0: [sdc] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=43s
[  814.180032] sd 3:0:0:0: [sdc] tag#8 Sense Key : Not Ready [current]
[  814.180033] sd 3:0:0:0: [sdc] tag#8 Add. Sense: Logical unit not ready, hard reset required
[  814.180035] sd 3:0:0:0: [sdc] tag#8 CDB: Write(16) 8a 00 00 00 00 00 00 00 12 30 00 00 00 08 00 00
[  814.180036] I/O error, dev sdc, sector 4656 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[  814.180039] Buffer I/O error on dev sdc1, logical block 326, lost async page write
[  814.180049] sd 3:0:0:0: [sdc] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=43s
[  814.180051] sd 3:0:0:0: [sdc] tag#9 Sense Key : Not Ready [current]
[  814.180052] sd 3:0:0:0: [sdc] tag#9 Add. Sense: Logical unit not ready, hard reset required
[  814.180054] sd 3:0:0:0: [sdc] tag#9 CDB: Write(16) 8a 00 00 00 00 00 00 00 12 38 00 00 00 28 00 00
[  814.180055] I/O error, dev sdc, sector 4664 op 0x1:(WRITE) flags 0x103000 phys_seg 5 prio class 2
[  814.180057] Buffer I/O error on dev sdc1, logical block 327, lost async page write
[  814.180059] Buffer I/O error on dev sdc1, logical block 328, lost async page write
[  814.180061] Buffer I/O error on dev sdc1, logical block 329, lost async page write
[  814.180062] Buffer I/O error on dev sdc1, logical block 330, lost async page write
[  814.180064] Buffer I/O error on dev sdc1, logical block 331, lost async page write
[  814.180071] ata4: EH complete
[  814.180082] ata4.00: detaching (SCSI 3:0:0:0)
[  814.180093] device offline error, dev sdc, sector 3997483496 op 0x1:(WRITE) flags 0x9800 phys_seg 3 prio class 2
[  814.180103] Aborting journal on device sdc1-8.
[  814.180108] device offline error, dev sdc, sector 3997435904 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[  814.180112] device offline error, dev sdc, sector 3997435904 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[  814.180114] Buffer I/O error on dev sdc1, logical block 499679232, lost sync page write
[  814.180117] JBD2: I/O error when updating journal superblock for sdc1-8.
[  814.180868] device offline error, dev sdc, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[  814.249625] EXT4-fs error (device sdc1): ext4_journal_check_start:84: comm CJobMgr::m_Work: Detected aborted journal
[  814.249642] Buffer I/O error on dev sdc1, logical block 0, lost sync page write
[  814.249645] EXT4-fs (sdc1): I/O error while writing superblock
[  814.249646] EXT4-fs (sdc1): Remounting filesystem read-only
[  814.346931] sd 3:0:0:0: [sdc] Synchronizing SCSI cache
[  814.346953] sd 3:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[  916.736893] EXT4-fs (sdc1): unmounting filesystem 750498b8-c3e1-4ec5-9c05-e5940bf40ef8.

Have you performed the long smart tests on your drives. Be aware that passing the smart tests does not guarantee the drive does is problem free.

I would stress test your drives by transferring very large files and tons of small files. I’m talking like a terabyte of mixed small and large data files in a continuous transfer queue. Use the standard KDE dolphin copy/paste method, not rsync or other terminal methods.

Watch your logs in a live log session to see the errors this may generate. Sometimes using a kernel with a different I/O scheduler can help. You can also manually change your scheduler that you use. I’ve worked around problems with I/O errors in the past by changing the kernel and scheduler in use.

You could also try swapping an internal drive that has I/O errors to an external caddy (and vice versa). If the errors disappear when swapped then the controller could likely be the cause of the issues.

hi Interestingly enough, I moved the entire steam library to the functioning storage drive, and the connection seemed stable enough 1.5 terabytes in all.
The transfer went through.

I can try copying as well.

This with schedulers is new to me so i need to check how to do it first and learn a bit more about it first.

Using LTS currently as a kernel, I could try swapping for the zen4 kernel.

Ill run the long smart test when i restart the pc later in the day.

1 Like

See section 2.4 Input/output schedulers

https://wiki.archlinux.org/title/Improving_performance#Storage_devices

thank you :slight_smile:

I read a bit through it, and it does go abit over my head.

nevertheless if I understood correctly

cat /sys/block/nvme0n1/queue/scheduler
File: /sys/block/nvme0n1/queue/scheduler
none mq-deadline [kyber] bfq

Kyber is in use?

according to the article.

The process to change I/O scheduler, depending on whether the disk is rotating or not can be automated and persist across reboots. For example the udev rules below set the scheduler to bfq for rotational drives, bfq for SSD/eMMC drives and none for NVMe drives:

I should change on the scheduler on this root drive to none?
and on the ssd storages on bfq?

Or is mixing not reccommended?

The part below seems to reccomend manual setup for each type of disk

/etc/udev/rules.d/60-ioschedulers.rules

HDD

ACTION==“add|change”, KERNEL==“sd[a-z]*”, ATTR{queue/rotational}==“1”, ATTR{queue/scheduler}=“bfq”

SSD

ACTION==“add|change”, KERNEL==“sd[a-z]|mmcblk[0-9]”, ATTR{queue/rotational}==“0”, ATTR{queue/scheduler}=“bfq”

NVMe SSD

ACTION==“add|change”, KERNEL==“nvme[0-9]*”, ATTR{queue/rotational}==“0”, ATTR{queue/scheduler}=“none”

Note, I ran the directory /etc/udev/rules.d/ and its empty, i presume i need to create a file?

Apologies if im annoying, im trying to understand more than following blindly.

Yes, you need to create the udev rule to have the changes stick.

You can change it manually temporarily:

To change the active I/O scheduler to bfq for device sda, use (as root):

echo bfq > /sys/block/sda/queue/scheduler
1 Like

So I created the rule file 60-ioschedulers.rules

In my case the syntax should be:

SSD

ACTION==“add|change”, KERNEL==“sd[a-z]|mmcblk[0-9]”, ATTR{queue/rotational}==“0”, ATTR{queue/scheduler}=“bfq”

NVMe SSD

ACTION==“add|change”, KERNEL==“nvme[0-9]*”, ATTR{queue/rotational}==“0”, ATTR{queue/scheduler}=“none”

Im confused about the parentesis after action and kernel. Is it fine to leave it as is, or should i specify add or change?

And is the Kernel== in this one a wildcard? or it needs the specific drive?
I am asking as I know the | symbol usually means, either or.

Sorry, it’s well after 2am here and I need some sleep.

Also, be sure your drive’s firmware is up to date.

No problem thank you for your time, please get some rest, ill try to figure this out meanwhile

1 Like