Linux 5.16's autodefrag BTRFS regression

TNE · 31 January 2022 15:52

Linux kernel 5.16 currently features a BTRFS regression, which can lead to high CPU usage and unnecessary SSD wear.
More details can be found in this Reddit post on r/selfhosted:

We have released a hotfix for this issue, which Garuda System Maintenance will prompt to install. You can also manually apply this hotfix by running the update command in your terminal of choice.

This fix is distributed as a hotfix instead of a traditional package update (like it would usually be) due to the permanent impact it can have on SSD health, so we want to roll it out as fast as possible.

Make sure to reboot your system afterwards or run sudo mount -a -o remount to apply the /etc/fstab changes.

FGD · 31 January 2022 15:56

Thank god Garuda Team is there to take care of us.

Carrot · 31 January 2022 15:57

thanks devs for caring us

SGS · 31 January 2022 16:08

Work, thanks so much

 ╭─sgs@yoga9 in ~/Videos via  took 10ms
 ╰─λ mount | grep autodefrag | grep btrfs
/dev/nvme0n1p5 on / type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=256,subvol=/@)
/dev/nvme0n1p5 on /root type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=258,subvol=/@root)
/dev/nvme0n1p5 on /srv type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=259,subvol=/@srv)
/dev/nvme0n1p5 on /home type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=257,subvol=/@home)
/dev/nvme0n1p5 on /var/cache type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=260,subvol=/@cache)
/dev/nvme0n1p5 on /var/log type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=261,subvol=/@log)
/dev/nvme0n1p5 on /var/tmp type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=262,subvol=/@tmp)

 ╭─sgs@yoga9 in ~/Videos via  took 10ms
 ╰─λ sudo mount -a -o remount
[sudo] Passwort für sgs:    
mount: /proc/sys/fs/binfmt_misc: Einhängepunkt ist noch nicht eingehängt oder es wurden ungültige Optionen angegeben.
mount: /sys/fs/cgroup/net_cls: Einhängepunkt wird benutzt.
/bin/sh: Zeile 1: gvfsd-fuse: Kommando nicht gefunden.
/bin/sh: Zeile 1: portal: Kommando nicht gefunden.

 ╭─sgs@yoga9 in ~/Videos via  took 2s
[🔴] × mount | grep autodefrag | grep btrfs

 ╭─sgs@yoga9 in ~/Videos via  took 10ms
[🔴] ×

Mr.Ryzen · 31 January 2022 16:40

just a question in kernel 5.16.3 there were provided 9 btrfs fixes. Was this not already fixed with those patches ?
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.16.3

jonathon · 31 January 2022 17:45

Short answer, no.

Maynne · 31 January 2022 18:12

This kinds of quirks with BTRFS are the very reasons I'm only using ext4 for my drives with my Frankenstein Garuda (BTRFS is really prone to real-world data loss, I know because I experienced that badly). It requires a lot of maintenance, lots of background processes just to run it efficiently. There are also maintenance tasks that needs to be done on a regular basis (though Garuda made it easier with BTRFS-assistant). This is unlike with ext4 that is basically maintenance-free.

MinePro120 · 31 January 2022 18:15

I've been on 5.16 (zen) since it came out basically, never had any issues (smartctl confirms that as well, I check it regularly). How could that be?

alexjp · 31 January 2022 18:19

This only happens if you have autodefrag enabled implicitly

alexjp · 31 January 2022 18:20

and yet … no so long ago: It's Looking Like The EXT4 Corruption Issue On Linux 4.19 Is Caused By BLK-MQ - Phoronix

this makes no sense… today was btrfs, previous was bcache, sometimes its ext4, others xfs … whatever, linux is coded by humans, and humans make mistakes!

Maynne · 31 January 2022 18:22

I have not encountered any data loss with ext4. It is like the energizer bunny that keeps on going. Solid performance, solid reliability for me for more than a decade now. I moved to Linux from Windows 2000 in 2010. That was when Microsoft discontinued Windows 2000.

TNE · 31 January 2022 18:23

That's the default on Garuda for as long as I can remember, so, it should apply.

TNE · 31 January 2022 18:30

Anecdotally, BTRFS has never caused data loss for me either. Anecdotal evidence goes both ways

FGD · 31 January 2022 18:58

And thank you for that!

freebird54 · 31 January 2022 19:09

I had already heard of this issue, and removed the autodefrag from my /etc/fstab. I was asked about a hotfix - would this have done something different? I have run a update since as well - do I have anything to think about?

BTW - because I run ZFS on the data side, my kernel was still 5.15.13 (now 5.16.3) - so nothing would have happened anyway, right?

owner of just enough knowledge to be dangerous!

SGS · 31 January 2022 19:11

Is enough

UUID=B3B3-D44D                            /boot/efi      vfat    umask=0077 0 2
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 / btrfs subvol=/@,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 1 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /home btrfs subvol=/@home,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /root btrfs subvol=/@root,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /srv btrfs subvol=/@srv,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/cache btrfs subvol=/@cache,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/log btrfs subvol=/@log,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/tmp btrfs subvol=/@tmp,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)

TNE · 31 January 2022 19:13

No, the hotfix only does something if you actually had autodefrag enabled. I also made sure it only applies to BTRFS too. You can take a look at the exact hotfix here: garuda-hotfixes · master · Garuda Linux / Packages / Stable PKGBUILDs / garuda-hotfixes · GitLab

freebird54 · 31 January 2022 19:52

I suspected such would be the case Anyway, I had already removed it, as Garuda is the reason (ONLY reason) I run btrfs - and I am trusting ATM that the default is no. Thanks a lot - and I love the quick response... I didn't know you had a hotfix 'procedure' in place - KOOL!

Thanks for the quick response... back into my usual sleepy daze...

Tian · 1 February 2022 14:48

small question, does this bug affects hdd too or is it only for ssd?

jonathon · 1 February 2022 15:05

Not sure. autodefrag should affect HDD too, but some people are reporting it only happens with autodefrag,ssd .

If you were seeing it on an HDD you'd notice a lot of drive churn.