Btrfs-cleaner and updatedb running at the same time, causing high system load and massive lags due to swapping

After running Garuda on my laptop for approx. 2 years with no problems and not changing my NVMe, nor changing the partition-table or file-systems, I have lately been experiencing sudden massive system load spikes and lags (going so far, that even the cursor freezes when moving the mouse) → (smells like swapping, tastes like swapping, probably is swapping). This lasts for a few minutes (~1-5min) and then resolves itself and goes back to normal.

My investigations have shown that both btrfs-cleaner AND updatedb seem to get started in a way that leads to them running concurrently at the same time, causing high load and memory usage, in turn causing kswapd to kick in, in turn causing massive system-load and and overall laggy system.

top - 09:38:38 up 17:38,  4 users,  load average: 8.28, 4.04, 1.90
Tasks: 473 total,   8 running, 465 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 29.1 sy,  0.0 ni, 69.4 id,  0.3 wa,  0.3 hi,  0.1 si,  0.0 st
MiB Mem :  15730.2 total,     69.8 free,   9184.7 used,   8281.0 buff/cache
MiB Swap:  33037.5 total,  29876.2 free,   3161.2 used.   6545.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3781 baccenf+  20   0   22.9g 343464 132184 R 165.8   2.1   5:57.66 Isolated Web Co
    164 root      20   0       0      0      0 R  99.7   0.0   3:30.53 kswapd0
  32751 root      39  19    6396   4220   3840 R  99.0   0.0   3:10.82 updatedb
    515 root      20   0       0      0      0 R  93.7   0.0   2:46.53 btrfs-cleaner
   1097 root      20   0 1096880 108512  81464 R  85.0   0.7   9:19.66 Xorg
System:
 Kernel: 6.10.3-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
   clocksource: tsc avail: acpi_pm
   parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
   root=UUID=0e48fd2e-4aaa-4609-8fb8-c477f0f931b0 rw rootflags=subvol=@
   quiet
   cryptdevice=UUID=76870f8a-2a08-4382-8f37-97eaf4fb75fb:luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb
   root=/dev/mapper/luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb quiet
   rd.udev.log_priority=3 vt.global_cursor_default=0
   resume=/dev/mapper/luks-cd774d59-4fcd-4ad6-94cc-5b22d8858918 loglevel=3
   ibt=off
 Desktop: i3 v: 4.23 with: i3bar tools: avail: i3lock,xautolock,xtrlock
   dm: LightDM v: 1.32.0 Distro: Garuda base: Arch Linux
Machine:
 Type: Laptop System: TUXEDO product: TUXEDO InfinityBook Pro Gen7 (MK1)
   v: Standard serial: <filter>
 Mobo: NB02 model: PHxARX1_PHxAQF1 v: Standard serial: <filter>
   part-nu: IBP1XI07MK1 uuid: adf12380-0cbc-11ed-9a44-4bf7f0011b00
   UEFI: American Megatrends LLC. v: N.1.05A07 date: 11/07/2022
Battery:
 ID-1: BAT0 charge: 66.6 Wh (100.0%) condition: 66.6/99.2 Wh (67.2%)
   volts: 16.7 min: 15.5 model: standard type: Li-ion serial: <filter>
   status: full
CPU:
 Info: model: 12th Gen Intel Core i7-12700H socket: U3E1 bits: 64
   type: MST AMCP arch: Alder Lake gen: core 12 level: v3 note: check
   built: 2021+ process: Intel 7 (10nm ESF) family: 6 model-id: 0x9A (154)
   stepping: 3 microcode: 0x433
 Topology: cpus: 1x cores: 14 mt: 6 tpc: 2 st: 8 threads: 20 smt: enabled
   cache: L1: 1.2 MiB desc: d-8x32 KiB, 6x48 KiB; i-6x32 KiB, 8x64 KiB
   L2: 11.5 MiB desc: 6x1.2 MiB, 2x2 MiB L3: 24 MiB desc: 1x24 MiB
 Speed (MHz): avg: 780 high: 2544 min/max: 400/4600:4700:3500
   base/boost: 2178/4700 scaling: driver: intel_pstate governor: powersave
   volts: 0.8 V ext-clock: 100 MHz cores: 1: 748 2: 400 3: 2544 4: 400
   5: 1171 6: 400 7: 400 8: 400 9: 1152 10: 400 11: 400 12: 400 13: 1402
   14: 1502 15: 400 16: 1232 17: 400 18: 400 19: 1064 20: 400
   bogomips: 107520
 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
 Vulnerabilities: <filter>
Graphics:
 Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics]
   vendor: Tongfang Hongkong driver: i915 v: kernel alternate: xe
   arch: Gen-12.2 process: Intel 10nm built: 2021-22+ ports: active: eDP-1
   empty: DP-1, DP-2, DP-3, DP-4, HDMI-A-1 bus-ID: 00:02.0 chip-ID: 8086:46a6
   class-ID: 0300
 Device-2: Chicony FHD Webcam driver: uvcvideo type: USB rev: 2.0
   speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-6:4 chip-ID: 04f2:b75c
   class-ID: 0e02 serial: <filter>
 Display: unspecified server: X.Org v: 21.1.13 compositor: Picom
   v: git-89c2c driver: X: loaded: modesetting alternate: fbdev,intel,vesa
   dri: iris gpu: i915 display-ID: :0 screens: 1
 Screen-1: 0 s-res: 2880x1800 s-dpi: 96 s-size: 762x476mm (30.00x18.74")
   s-diag: 898mm (35.37")
 Monitor-1: eDP-1 model-id: CSO 0x1402 built: 2020 res: 2880x1800 hz: 90
   dpi: 242 gamma: 1.2 size: 302x188mm (11.89x7.4") diag: 356mm (14")
   ratio: 16:10 modes: 2880x1800
 API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
   device: 1 drv: swrast gbm: drv: iris surfaceless: drv: iris x11: drv: iris
   inactive: wayland
 API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 24.1.5-arch1.1
   glx-v: 1.4 direct-render: yes renderer: Mesa Intel Graphics (ADL GT2)
   device-ID: 8086:46a6 memory: 7.5 GiB unified: yes
 API: Vulkan v: 1.3.279 layers: 3 device: 0 type: integrated-gpu name: Intel
   Graphics (ADL GT2) driver: mesa intel v: 24.1.5-arch1.1
   device-ID: 8086:46a6 surfaces: xcb,xlib device: 1 type: cpu name: llvmpipe
   (LLVM 18.1.8 256 bits) driver: mesa llvmpipe v: 24.1.5-arch1.1 (LLVM
   18.1.8) device-ID: 10005:0000 surfaces: xcb,xlib
Audio:
 Device-1: Intel Alder Lake PCH-P High Definition Audio
   vendor: Tongfang Hongkong driver: snd_hda_intel v: kernel
   alternate: snd_soc_avs,snd_sof_pci_intel_tgl bus-ID: 00:1f.3
   chip-ID: 8086:51c8 class-ID: 0403
 API: ALSA v: k6.10.3-zen1-1-zen status: kernel-api tools: N/A
 Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
 Server-2: PipeWire v: 1.2.2 status: n/a (root, process) with:
   1: pipewire-pulse status: active 2: wireplumber status: active
   3: pipewire-alsa type: plugin 4: pw-jack type: plugin
   tools: pactl,pw-cat,pw-cli,wpctl
Network:
 Device-1: Intel Alder Lake-P PCH CNVi WiFi driver: iwlwifi v: kernel
   bus-ID: 00:14.3 chip-ID: 8086:51f0 class-ID: 0280
 IF: wlo1 state: up mac: <filter>
 IF-ID-1: tun0 state: unknown speed: 10000 Mbps duplex: full mac: N/A
 Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
 Device-1: Intel AX201 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
   speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-10:5 chip-ID: 8087:0026
   class-ID: e001
 Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.2
   lmp-v: 11 status: discoverable: no pairing: no class-ID: 6c010c
Drives:
 Local Storage: total: 465.76 GiB used: 309.84 GiB (66.5%)
 SMART Message: Required tool smartctl not installed. Check --recommends
 ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 500GB
   size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
   lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 32.9 C
   scheme: GPT
Partition:
 ID-1: / raw-size: 448.56 GiB size: 448.56 GiB (100.00%)
   used: 309.84 GiB (69.1%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
   maj-min: 254:0 mapped: luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb
 ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
   used: 728 KiB (0.2%) fs: vfat block-size: 512 B dev: /dev/nvme0n1p1
   maj-min: 259:1
 ID-3: /home raw-size: 448.56 GiB size: 448.56 GiB (100.00%)
   used: 309.84 GiB (69.1%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
   maj-min: 254:0 mapped: luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb
 ID-4: /var/log raw-size: 448.56 GiB size: 448.56 GiB (100.00%)
   used: 309.84 GiB (69.1%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
   maj-min: 254:0 mapped: luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb
 ID-5: /var/tmp raw-size: 448.56 GiB size: 448.56 GiB (100.00%)
   used: 309.84 GiB (69.1%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
   maj-min: 254:0 mapped: luks-76870f8a-2a08-4382-8f37-97eaf4fb75fb
Swap:
 Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
 ID-1: swap-1 type: zram size: 15.36 GiB used: 5.46 GiB (35.5%)
   priority: 100 comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 20
   dev: /dev/zram0
 ID-2: swap-2 type: partition size: 16.9 GiB used: 0 KiB (0.0%)
   priority: -2 dev: /dev/dm-1 maj-min: 254:1
   mapped: luks-cd774d59-4fcd-4ad6-94cc-5b22d8858918
Sensors:
 System Temperatures: cpu: 55.0 C mobo: N/A
 Fan Speeds (rpm): N/A
Info:
 Memory: total: 16 GiB available: 15.36 GiB used: 7.53 GiB (49.0%)
   igpu: 60 MiB
 Processes: 463 Power: uptime: 18h 19m states: freeze,mem,disk
   suspend: s2idle avail: deep wakeups: 2 hibernate: platform avail: shutdown,
   reboot, suspend, test_resume image: 6.1 GiB services: upowerd
   Init: systemd v: 256 default: graphical tool: systemctl
 Packages: pm: pacman pkgs: 2143 libs: 507 tools: pacseek,pamac,paru
   Compilers: clang: 18.1.8 gcc: 14.2.1 Shell: garuda-inxi (sudo) default: Zsh
   v: 5.9 running-in: zellij inxi: 3.3.35
Garuda (2.6.26-1):
 System install date:     2023-02-01
 Last full system update: 2024-08-06
 Is partially upgraded:   No
 Relevant software:       snapper NetworkManager mkinitcpio
 Windows dual boot:       No/Undetected
 Failed units:            session-c1.scope
top - 09:58:54 up 1 day, 17:59,  4 users,  load average: 10.58, 7.66, 4.18
Tasks: 499 total,  10 running, 489 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us, 51.0 sy,  0.2 ni, 48.0 id,  0.1 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  15730.2 total,     69.7 free,   7506.2 used,   9416.1 buff/cache
MiB Swap:  33037.5 total,  25006.5 free,   8031.0 used.   8224.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2965 baccenf+  20   0   27.9g 354072 164072 S 233.9   2.2  81:30.44 firefox
   3781 baccenf+  20   0   23.2g 302864 105980 R 102.3   1.9  25:48.17 Isolated Web Co
  11909 baccenf+  20   0 1137.7g 260484  77736 R 101.7   1.6  30:49.41 electron
    515 root      20   0       0      0      0 R 100.0   0.0  15:40.87 btrfs-cleaner
    164 root      20   0       0      0      0 R  99.7   0.0  19:34.63 kswapd0
    960 root      20   0  561072  12804  11044 R  91.4   0.1   2:04.85 NetworkManager
   1097 root      20   0 1149536  54100  30412 R  91.0   0.3  30:23.34 Xorg

again and again and again…

top - 17:43:20 up  3:46,  2 users,  load average: 5.25, 2.07, 1.20
Tasks: 468 total,   5 running, 463 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.3 us, 23.1 sy,  0.0 ni, 72.6 id,  1.3 wa,  0.5 hi,  0.2 si,  0.0 st
MiB Mem :  15730.2 total,     80.9 free,   8471.6 used,   9019.8 buff/cache
MiB Swap:  33037.5 total,  29151.0 free,   3886.5 used.   7258.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    164 root      20   0       0      0      0 R  99.0   0.0   1:29.64 kswapd0
  46629 root      39  19    6396   4828   4000 R  96.0   0.0   1:15.17 updatedb
    496 root      20   0       0      0      0 R  85.1   0.0   0:49.60 btrfs-cleaner
   1091 root      20   0  810360  76944  42536 R  55.0   0.5  13:07.83 Xorg
   3245 baccenf+  20   0   27.8g 552484 280028 S  37.7   3.4  39:19.92 firefox
   2142 baccenf+   9 -11  193500  85908   8108 S  12.6   0.5  20:51.64 pipewire-pulse

Please do not post here in the forum, as in chat forums such as Telegram.
As long as no one has replied to your post, you can edit your post.
Empty posts without new information are considered impolite in forums.
They give the impression that you want to push your request in front of others.
Just wait until it is your turn.

If you do not receive an answer, it may either be due to your rude behavior or there is no one in the forum who has a solution to your problem.

2 Likes

If that were true, then we would have massive user reports about the same issue.
If your investigation’s assumption is correct, you may disable updatedb.timer, or modify the calendar trigger to once a week. IIRC updatedb is active on first boot, whenever this is (not every day, I think, but it doesn’t matter that much).

From your feedback, I don’t see something justifying lag. Your experience of the lag may be coming from something else.
I would suggest more investigation, journal logs and widening of the search fields.
Keep us up to date, though.

3 Likes

Hi guys,

I don’t know if this is related to the issue in the thread or not. I found your thread on Google.

I have a laptop with a BTRFS root filesystem (300gb) and snapshots managed by snapper (currently about 30 snapshots). This setup has been working without issues for about 5 years.

I periodically back up this filesystem to an external NVMe drive. Today, I started this backup as usual and was very surprised to see a freeze with kswapd and btrfs-cleaner.

I haven’t had much time to debug this yet, but I discovered it’s not related to the NVMe drive because it happens even when I use btrfs send to /dev/null.

I tried changing the kernel to the LTS version (6.6) and the issue disappeared.

So, could you please check if you can reproduce this issue on previous kernel versions? It looks like something is broken in 6.10

UPD:

I suspect that this kernel regression might be related:

2 Likes

I have no idea if the above failed service is responsible for your issue, but I would think that is a rather important systemd component.

https://wiki.archlinux.org/title/Systemd/User

The user session lives entirely inside a systemd scope

1 Like

I apologize. I thought of it more as of updates to a thread and wasn’t trying to be pushy. I will update the original post from here on forward.

Perhaps it is in fact related to the btrfs regression, above? Because I do have quite a few heavy-weight applications running (such as Firefox, Thunderbird, Elements-Desktop) that very likely carry a lot of open file-handles while all of this is happening.

I’ve double-checked btrfs-assistent, despite never having opened it or changed any configuration in it, before. Everything already is scheduled as weekly. I am seeing these load problems occur multiple times per day, now.

Why would I see a btrfs-cleaner process running with 100% CPU utilization several times per day, if it is scheduled as weekly? What else could be triggering it?


I’ve realized that right now it looks like this phenomenon doesn’t happen until I systemctl suspend; xtrlock, close my lid and subsequently open my lid and wake up my laptop.

I refrained from suspending my laptop, yesterday, after having cold-rebooted it after my last edit. And so far, the phenomenon hasn’t occurred.

Did you read this response?

Try changing to the LTS kernel according to that suggestion to resolve your issue.

3 Likes

Looking good, so far. No problems after +24h uptime on 6.6.