Disk IO reaches 100%, causing system hangs

And if you change the brtfs mount parameters in /etc/fstab ? Please test it, for me this solved the problem! Have a nice day! (from commit=120 0 2 to 15 0 2 for all brtfs partitions)

OP has already specified that:

The BTRFS built-in default is commit=30 so there should be no reason to arbitrarily change it to anything else.

1 Like

The other obvious thing is to try a different kernel. TKG kernels seem to have very narrow performance benefits whereas linux and linux-zen are designed to work well on a wide range of hardware.

2 Likes

Yeah, I've been sensing performance regression in 5.10 TKG kernel. Sensing in general everyday usage but haven't done any thorough testing. Switched to the zen kernel and been happy with it.

2 Likes

This ^^^ is a common issue reported on numerous threads complaining of freezing when using the btrfs file system with quotas enabled.

I cannot say for certain if disabling btrfs quotas will help in your specific case. However, if you do not provide feedback when solutions are suggested how are we to know?

Disabling quotas does have a negative impact on timeshift, as it disables timeshift’s ability to gauge how much space is being used for snapshots. Given the choice between system freezes and reducing timeshift’s features, I’ll leave it to you to decide which might be more important.

Please provide feedback to all suggestions put to you, as this benefits all other Garuda users having similar issues.

2 Likes

Unluckily I'm experiencing lag when the I/O is heavy, ie when updating system (even with a simple rm -rf of big directory). On same load pure arch linux didn't give me any trouble.

System:    Kernel: 5.10.15-120-tkg-bmq x86_64 bits: 64 compiler: gcc v: 10.2.1 
           parameters: intel_pstate=passive BOOT_IMAGE=/@/boot/vmlinuz-linux-tkg-bmq 
           root=UUID=d565994a-fb0a-42e8-a38f-11d09e59d4f5 rw rootflags=subvol=@ quiet splash rd.udev.log_priority=3 
           vt.global_cursor_default=0 systemd.unified_cgroup_hierarchy=1 resume=UUID=c3ec5c29-680c-478a-b363-b6b36661d94d 
           loglevel=3 
           Desktop: KDE Plasma 5.21.0 tk: Qt 5.15.2 info: latte-dock wm: kwin_x11 dm: SDDM Distro: Garuda Linux 
Machine:   Type: Laptop System: Dell product: Latitude 5490 v: N/A serial: <filter> Chassis: type: 10 serial: <filter> 
           Mobo: Dell model: 0DH2HV v: A00 serial: <filter> UEFI: Dell v: 1.5.0 date: 08/27/2018 
Battery:   ID-1: BAT0 charge: 55.4 Wh condition: 55.4/68.0 Wh (81%) volts: 8.6/7.6 model: SMP DELL GD1JP65 type: Li-poly 
           serial: <filter> status: Full 
CPU:       Info: Quad Core model: Intel Core i5-8350U bits: 64 type: MT MCP arch: Kaby Lake note: check family: 6 
           model-id: 8E (142) stepping: A (10) microcode: E0 L2 cache: 6 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 30424 
           Speed: 1985 MHz min/max: 400/3600 MHz Core speeds (MHz): 1: 1985 2: 2881 3: 3601 4: 3070 5: 3121 6: 3602 7: 3600 
           8: 3602 
           Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
           Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
           Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling 
           Type: srbds mitigation: Microcode 
           Type: tsx_async_abort mitigation: Clear CPU buffers; SMT vulnerable 
Graphics:  Device-1: Intel UHD Graphics 620 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:5917 
           class ID: 0300 
           Device-2: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo bus ID: 1-5:3 chip ID: 0c45:6717 class ID: 0e02 
           Display: x11 server: X.Org 1.20.10 compositor: kwin_x11 driver: loaded: intel unloaded: modesetting 
           alternate: fbdev,vesa display ID: :0 screens: 1 
           Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1013x285mm (39.9x11.2") s-diag: 1052mm (41.4") 
           Monitor-1: eDP1 res: 1920x1080 hz: 60 dpi: 157 size: 310x170mm (12.2x6.7") diag: 354mm (13.9") 
           Monitor-2: DP1-2 res: 1920x1080 hz: 60 dpi: 92 size: 530x300mm (20.9x11.8") diag: 609mm (24") 
           OpenGL: renderer: Mesa Intel UHD Graphics 620 (KBL GT2) v: 4.6 Mesa 20.3.4 direct render: Yes 
Audio:     Device-1: Intel Sunrise Point-LP HD Audio vendor: Dell driver: snd_hda_intel v: kernel alternate: snd_soc_skl 
           bus ID: 00:1f.3 chip ID: 8086:9d71 class ID: 0403 
           Device-2: Realtek USB Audio type: USB driver: snd-usb-audio bus ID: 1-1.5:8 chip ID: 0bda:4014 class ID: 0102 
           serial: <filter> 
           Sound Server: ALSA v: k5.10.15-120-tkg-bmq 
Network:   Device-1: Intel Ethernet I219-LM vendor: Dell driver: e1000e v: kernel port: f040 bus ID: 00:1f.6 
           chip ID: 8086:15d7 class ID: 0200 
           IF: enp0s31f6 state: down mac: <filter> 
           Device-2: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel port: f040 bus ID: 02:00.0 chip ID: 8086:24fd 
           class ID: 0280 
           IF: wlp2s0 state: up mac: <filter> 
           IF-ID-1: wwp0s20f0u2i12 state: down mac: <filter> 
Bluetooth: Device-1: Intel Bluetooth wireless interface type: USB driver: btusb v: 0.8 bus ID: 1-7:5 chip ID: 8087:0a2b 
           class ID: e001 
           Message: Required tool hciconfig not installed. Check --recommends 
RAID:      Hardware-1: Intel 82801 Mobile SATA Controller [RAID mode] driver: ahci v: 3.0 port: f060 bus ID: 00:17.0 
           chip ID: 8086.282a rev: 21 
Drives:    Local Storage: total: 465.76 GiB used: 40.42 GiB (8.7%) 
           SMART Message: Unable to run smartctl. Root privileges required. 
           ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB block size: physical: 512 B 
           logical: 512 B speed: 6.0 Gb/s rotation: SSD serial: <filter> rev: 1B6Q scheme: GPT 
Partition: ID-1: / raw size: 448.42 GiB size: 448.42 GiB (100.00%) used: 40.42 GiB (9.0%) fs: btrfs dev: /dev/sda2 
           maj-min: 8:2 
           ID-2: /boot/efi raw size: 300 MiB size: 299.4 MiB (99.80%) used: 560 KiB (0.2%) fs: vfat dev: /dev/sda1 
           maj-min: 8:1 
           ID-3: /home raw size: 448.42 GiB size: 448.42 GiB (100.00%) used: 40.42 GiB (9.0%) fs: btrfs dev: /dev/sda2 
           maj-min: 8:2 
           ID-4: /var/log raw size: 448.42 GiB size: 448.42 GiB (100.00%) used: 40.42 GiB (9.0%) fs: btrfs dev: /dev/sda2 
           maj-min: 8:2 
           ID-5: /var/tmp raw size: 448.42 GiB size: 448.42 GiB (100.00%) used: 40.42 GiB (9.0%) fs: btrfs dev: /dev/sda2 
           maj-min: 8:2 
Swap:      Kernel: swappiness: 10 (default 60) cache pressure: 75 (default 100) 
           ID-1: swap-1 type: partition size: 17.04 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/sda3 maj-min: 8:3 
           ID-2: swap-2 type: zram size: 1.94 GiB used: 1024 KiB (0.1%) priority: 32767 dev: /dev/zram0 
           ID-3: swap-3 type: zram size: 1.94 GiB used: 752 KiB (0.0%) priority: 32767 dev: /dev/zram1 
           ID-4: swap-4 type: zram size: 1.94 GiB used: 820 KiB (0.0%) priority: 32767 dev: /dev/zram2 
           ID-5: swap-5 type: zram size: 1.94 GiB used: 944 KiB (0.0%) priority: 32767 dev: /dev/zram3 
           ID-6: swap-6 type: zram size: 1.94 GiB used: 900 KiB (0.0%) priority: 32767 dev: /dev/zram4 
           ID-7: swap-7 type: zram size: 1.94 GiB used: 840 KiB (0.0%) priority: 32767 dev: /dev/zram5 
           ID-8: swap-8 type: zram size: 1.94 GiB used: 708 KiB (0.0%) priority: 32767 dev: /dev/zram6 
           ID-9: swap-9 type: zram size: 1.94 GiB used: 668 KiB (0.0%) priority: 32767 dev: /dev/zram7 
Sensors:   System Temperatures: cpu: 53.0 C mobo: 47.0 C sodimm: SODIMM C 
           Fan Speeds (RPM): cpu: 2974 
Info:      Processes: 299 Uptime: 6h 05m wakeups: 3 Memory: 15.5 GiB used: 5.08 GiB (32.8%) Init: systemd v: 247 Compilers: 
           gcc: 10.2.0 clang: 11.0.1 Packages: 1414 apt: 1 pacman: 1413 lib: 337 Shell: fish v: 3.1.2 running in: konsole 
           inxi: 3.3.01 

Do you have any advice?

the smart part with root privileges is

Drives:    Local Storage: total: 465.76 GiB used: 40.42 GiB (8.7%) 
           ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 EVO 500GB family: based SSDs size: 465.76 GiB 
           block size: physical: 512 B logical: 512 B sata: 3.1 speed: 6.0 Gb/s rotation: SSD serial: <filter> rev: 1B6Q 
           temp: 33 C scheme: GPT 
           SMART: yes state: enabled health: PASSED on: 141d 4h cycles: 1426 written: 4.71 TiB 

Try a different kernel, e.g. linux or linux-zen.

1 Like

It’s a USB external drive that your system is installed on.

Run in a terminal

journalctl -f --no-hostname

and start vivaldi to browse.
When I/O goes up to 99%, break terminal command (Ctrl-C) and copy/paste the log here at bin.garuda.org bin.garudalinux.org.

2 Likes

That link is broken.

Yes, I know! but the problem also happens when it’s mounted internally.

Sorry. Try http://bin.garudalinux.org/

2 Likes

This was during a period of high IO usage, when just doing a package install caused a system slowdown. BTRFS-Cleaner and systemd-journalD were causing the most issues according to Glances.

Once it hangs during web browsing again I'll send the logs.

https://bin.garudalinux.org/?9cdb701b9607d76b#4JYQB8EDQnu77FrYM5adU7ki1tbrqfdJNnBNkyTr9VaL

Have you checked your journal logs to see how much room they are taking up?

This sounds oddly familiar for some reason?

Also the last time I installed glances, glances itself had a sizable memory leak. That was a while back and I have no idea if that’s been resolved.

Glances hasn’t had a memory leak, at least for me.

I haven’t checked my journal logs either. I probably should.

I’d love to disable quotas, but every time I take a snapshot, it gets re-enabled.

It’s annoying as hell.

You have to change a timeshift setting for it to remain disabled. I posted a link on that topic a little while ago on this very thread.

Change the following timeshift setting to false:

btrfs_use_qgroup

This is all detailed on the link I posted already.

Just tested by deleting all my snapshots and creating a new one. Quotas remained disabled in my test. Not sure if that will change when timeshift is updated or not, (but it appears to stick in my case).

timeshift is re-enabling btrfs quota? btrfs snapshot delete causes processes doing IO to hang for 1-5 minutes #697

I do not schedule automatic snapshots, so I'm not to concerned about any complications with qutas disabled. I may just leave things this way for a while to see how it goes.

3 Likes