Disk IO reaches 100%, causing system hangs

I'm running Garuda on a relatively high-end system, and I am having incredibly frustrating stystem hangs. I'm booted off of a 970 Pro, and I've had constant hangs the last several days from random 100% IO usage.

It worked fine until I reinstalled a few days ago, and now it's unusable.

Imgur

When stuff like this happens, everything crawls to a stop.

This shouldn't be happening with an NVME SSD. Anyone have any tips?

I've tried disabling swap, enabling swap, reinstalling, moving to an older kernel, newer kernel, older version of btrfs-progs, mounting the nvme in another slot, using an external enclosure, etc. Nothing has changed.

Usually it is btrfs-cleaner or my browser, vivaldi or firefox that can cause the massive IO spikes.

1 Like

I know in the past steam has had issues with kernel tracing and vulkan or something.
Have you tried disabling kernel tracing?
Check if it's enabled with:
grep . /sys/kernel/debug/tracing/{tracing_on,events/enable}

I would also check syslog or /proc/kmsg

3 Likes
/sys/kernel/debug/tracing/events/enable:0

Alright, time to see how to disable that!

Here is mine.
image

It seems funny that it was working fine the other day though. Has steam updated recently? You mentioned everything worked until you reinstalled. Maybe something happened, did you try another reinstall?

1 Like

I would also try journalctl -p 3 -xb to see if there are any disk errors or whatnot.

1 Like

Test af few different kernels the ones I would suggest are linux-zen , linux , linux-mainline , and linux-lts .

You may also want to change your I/O scheduler in use.

Check if you have any error messages related to your scheduler.

3 Likes

I am on ssd and BMQ kernel and my system briefly hangs during major pacman updates...

Maybe try the default kernel for troubleshooting sake instead of a bleeding edge kernel with a new BitMap Queue scheduler.

My 1st and most important tip is

DO NOT USE IMAGES FOR TEXT

Is your CPU usage high? Why do you think I/O cannot be high because of NVME SSD? Can you provide a link with info suggesting this? I would appreciate learning more about it.

Steam and Vivaldi relate to internet/network.
Disable your internet connection and see what happens.

Also,

How much time consuming is to check for logs?

And I wonder if you’ve ever read Wiki on posting issues.

No logs == No problem

I’ve already tried different kernels and schedulers.

There’s no disk errors, but there is an error from bootup.

Feb 10 16:05:17 indy systemd-udevd[546]: host1: /etc/udev/rules.d/50-sata.rules:3 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:08.1/0000:0b:00.3/usb6/6-4/6-4:1.0/host1/scsi_host/host1/link_power_management_poli>
Feb 10 16:05:17 indy systemd-udevd[554]: host2: /etc/udev/rules.d/50-sata.rules:3 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:08.1/0000:0b:00.3/usb5/5-3/5-3:1.0/host2/scsi_host/host2/link_power_management_poli>
Feb 10 16:05:17 indy systemd-udevd[557]: host0: /etc/udev/rules.d/50-sata.rules:3 Failed to write ATTR{/sys/devices/pci0000:00/0000:00:01.2/0000:02:00.0/0000:03:08.0/0000:06:00.3/usb4/4-4/4-4:1.0/host0/scsi_host/host0/>
Feb 10 16:05:17 indy systemd[1]: Failed to activate swap /swapfile.`

That is fair, however during that time I cannot copy the text, as my system is locked up.

no, it is not. This happens at idle, just browsing the web.

This happens when downloading games and browsing the web. I mean, disconnecting from the internet would definitely solve the problem, but only because it stops me from performing those basic tasks.

I did check logs like dmesg and journalctl, and I did not see anything that appeared relevant.

Hang on - setting the wrong power management setting can severely nerf SSD performance.

This rule seems to be for a USB device, but have you checked the contents of /etc/udev/rules.d/50-sata.rules ?

Could be an issue with Vivaldi, try a different browser.

Was your network driver updated? While very rare I have seen WiFi driver bugs in the past that induced 100% CPU usage before.

Please post a full:

inxi -Fxxxza

There you are. You’ve found the problem.
Follow the trail.

You can try turn off hardware acceleration on your browsers.

I've tried a different browser, same problem.

The problem is that this is new. Disabling hardware acceleration won't do anything to stop them from locking up the disk, either.

           parameters: intel_pstate=passive BOOT_IMAGE=/@/boot/vmlinuz-linux-tkg-bmq 
           root=UUID=474db5d5-5d93-4f07-a3aa-77d5c9a52327 rw rootflags=subvol=@ rd.udev.log_priority=3 
           vt.global_cursor_default=0 systemd.unified_cgroup_hierarchy=1 loglevel=3 mitigations=off sysrq_always_enabled=1 
           Desktop: GNOME 3.38.3 tk: GTK 3.24.24 wm: gnome-shell dm: GDM 3.38.2.1 Distro: Garuda Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING v: Rev X.0x serial: <filter> UEFI: American Megatrends 
           v: 3406 date: 02/04/2021 
Battery:   ID-1: hidpp_battery_0 charge: N/A condition: N/A volts: 4.2/N/A model: Logitech G903 Wired/Wireless Gaming Mouse 
           type: N/A serial: <filter> status: Full 
           Device-1: apple_mfi_fastcharge model: N/A serial: N/A charge: N/A status: N/A 
CPU:       Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 2 family: 17 (23) model-id: 71 (113) 
           stepping: N/A microcode: 8701021 L2 cache: 6 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 182589 
           Speed: 4125 MHz min/max: 2200/3800 MHz boost: enabled Core speeds (MHz): 1: 4125 2: 4123 3: 4123 4: 4122 5: 4094 
           6: 4099 7: 4127 8: 4113 9: 4086 10: 4096 11: 4113 12: 4107 13: 4122 14: 4109 15: 4095 16: 4101 17: 4113 18: 4125 
           19: 4125 20: 4125 21: 4125 22: 4125 23: 4125 24: 4123 
           Vulnerabilities: Type: itlb_multihit status: Not affected 
           Type: l1tf status: Not affected 
           Type: mds status: Not affected 
           Type: meltdown status: Not affected 
           Type: spec_store_bypass status: Vulnerable 
           Type: spectre_v1 status: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers 
           Type: spectre_v2 status: Vulnerable, IBPB: disabled, STIBP: disabled 
           Type: srbds status: Not affected 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] 
           vendor: Sapphire Limited driver: amdgpu v: kernel bus ID: 09:00.0 chip ID: 1002:731f class ID: 0300 
           Display: x11 server: X.Org 1.20.10 compositor: gnome-shell driver: loaded: amdgpu,ati unloaded: modesetting,radeon 
           alternate: fbdev,vesa display ID: :1 screens: 1 
           Screen-1: 0 s-res: 3440x1440 s-dpi: 96 s-size: 910x381mm (35.8x15.0") s-diag: 987mm (38.8") 
           Monitor-1: DisplayPort-2 res: 3440x1440 hz: 144 dpi: 109 size: 800x335mm (31.5x13.2") diag: 867mm (34.1") 
           OpenGL: renderer: AMD Radeon RX 5700 XT (NAVI10 DRM 3.40.0 5.10.14-119-tkg-bmq LLVM 11.0.1) v: 4.6 Mesa 20.3.4 
           direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Navi 10 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 09:00.1 
           chip ID: 1002:ab38 class ID: 0403 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus ID: 0b:00.4 chip ID: 1022:1487 class ID: 0403 
           Device-3: Logitech Logitech BRIO type: USB driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus ID: 6-1.3:5 
           chip ID: 046d:085e class ID: 0300 serial: <filter> 
           Device-4: Blue Microphones Yeti Stereo Microphone type: USB driver: hid-generic,snd-usb-audio,usbhid 
           bus ID: 5-1.2.3:10 chip ID: b58e:9e84 class ID: 0300 serial: <filter> 
           Sound Server: ALSA v: k5.10.14-119-tkg-bmq 
Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus ID: 04:00.0 chip ID: 8086:2723 class ID: 0280 
           IF: wlp4s0 state: down mac: <filter> 
           Device-2: Intel I211 Gigabit Network vendor: ASUSTeK driver: igb v: kernel port: f000 bus ID: 05:00.0 
           chip ID: 8086:1539 class ID: 0200 
           IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           IF-ID-1: enp11s0f3u2c4i2 state: down mac: <filter> 
Bluetooth: Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus ID: 3-6:3 chip ID: 8087:0029 class ID: e001 
           Message: Required tool hciconfig not installed. Check --recommends ```

Are you using network protocols such as SMB or NFS. I have seen I/O problems created by SMB in the past as well. Are you mounting any network shares?