Linux 5.16's autodefrag BTRFS regression

Linux kernel 5.16 currently features a BTRFS regression, which can lead to high CPU usage and unnecessary SSD wear.
More details can be found in this Reddit post on r/selfhosted:

We have released a hotfix for this issue, which Garuda System Maintenance will prompt to install. You can also manually apply this hotfix by running the update command in your terminal of choice.

This fix is distributed as a hotfix instead of a traditional package update (like it would usually be) due to the permanent impact it can have on SSD health, so we want to roll it out as fast as possible.

Make sure to reboot your system afterwards or run sudo mount -a -o remount to apply the /etc/fstab changes.

32 Likes

Thank god Garuda Team is there to take care of us. :slight_smile:

7 Likes

thanks devs for caring us

1 Like

Work, thanks so much :slight_smile:

 ╭─[email protected] in ~/Videos via  took 10ms
 ╰─λ mount | grep autodefrag | grep btrfs
/dev/nvme0n1p5 on / type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=256,subvol=/@)
/dev/nvme0n1p5 on /root type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=258,subvol=/@root)
/dev/nvme0n1p5 on /srv type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=259,subvol=/@srv)
/dev/nvme0n1p5 on /home type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=257,subvol=/@home)
/dev/nvme0n1p5 on /var/cache type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=260,subvol=/@cache)
/dev/nvme0n1p5 on /var/log type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=261,subvol=/@log)
/dev/nvme0n1p5 on /var/tmp type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache=v2,autodefrag,subvolid=262,subvol=/@tmp)

 ╭─[email protected] in ~/Videos via  took 10ms
 ╰─λ sudo mount -a -o remount
[sudo] Passwort für sgs:    
mount: /proc/sys/fs/binfmt_misc: Einhängepunkt ist noch nicht eingehängt oder es wurden ungültige Optionen angegeben.
mount: /sys/fs/cgroup/net_cls: Einhängepunkt wird benutzt.
/bin/sh: Zeile 1: gvfsd-fuse: Kommando nicht gefunden.
/bin/sh: Zeile 1: portal: Kommando nicht gefunden.

 ╭─[email protected] in ~/Videos via  took 2s
[🔴] × mount | grep autodefrag | grep btrfs

 ╭─[email protected] in ~/Videos via  took 10ms
[🔴] × 
2 Likes

just a question in kernel 5.16.3 there were provided 9 btrfs fixes. Was this not already fixed with those patches ?
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.16.3

Short answer, no.

2 Likes

This kinds of quirks with BTRFS are the very reasons I'm only using ext4 for my drives with my Frankenstein Garuda (BTRFS is really prone to real-world data loss, I know because I experienced that badly). It requires a lot of maintenance, lots of background processes just to run it efficiently. There are also maintenance tasks that needs to be done on a regular basis (though Garuda made it easier with BTRFS-assistant). This is unlike with ext4 that is basically maintenance-free.

I've been on 5.16 (zen) since it came out basically, never had any issues (smartctl confirms that as well, I check it regularly). How could that be?

This only happens if you have autodefrag enabled implicitly

and yet ... no so long ago: It's Looking Like The EXT4 Corruption Issue On Linux 4.19 Is Caused By BLK-MQ - Phoronix

this makes no sense... today was btrfs, previous was bcache, sometimes its ext4, others xfs ... whatever, linux is coded by humans, and humans make mistakes!

4 Likes

I have not encountered any data loss with ext4. It is like the energizer bunny that keeps on going. Solid performance, solid reliability for me for more than a decade now. I moved to Linux from Windows 2000 in 2010. That was when Microsoft discontinued Windows 2000.

That's the default on Garuda for as long as I can remember, so, it should apply.

3 Likes

Anecdotally, BTRFS has never caused data loss for me either. Anecdotal evidence goes both ways :slight_smile:

5 Likes

And thank you for that! :slight_smile:

I had already heard of this issue, and removed the autodefrag from my /etc/fstab. I was asked about a hotfix - would this have done something different? I have run a update since as well - do I have anything to think about?

BTW - because I run ZFS on the data side, my kernel was still 5.15.13 (now 5.16.3) - so nothing would have happened anyway, right?

owner of just enough knowledge to be dangerous!
2 Likes

Is enough :slight_smile:

UUID=B3B3-D44D                            /boot/efi      vfat    umask=0077 0 2
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 / btrfs subvol=/@,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 1 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /home btrfs subvol=/@home,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /root btrfs subvol=/@root,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /srv btrfs subvol=/@srv,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/cache btrfs subvol=/@cache,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/log btrfs subvol=/@log,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
UUID=35075ddb-bcf7-47aa-a43a-e759a082af57 /var/tmp btrfs subvol=/@tmp,defaults,noatime,space_cache,noautodefrag,compress=zstd 0 2 #Modified_by_garuda-hotfixes(1)
3 Likes

No, the hotfix only does something if you actually had autodefrag enabled. I also made sure it only applies to BTRFS too. You can take a look at the exact hotfix here: garuda-hotfixes · master · Garuda Linux / Packages / Stable PKGBUILDs / garuda-hotfixes · GitLab

9 Likes

I suspected such would be the case :grin: Anyway, I had already removed it, as Garuda is the reason (ONLY reason) I run btrfs - and I am trusting ATM that the default is no. Thanks a lot - and I love the quick response... I didn't know you had a hotfix 'procedure' in place - KOOL!

Thanks for the quick response... back into my usual sleepy daze...

5 Likes

small question, does this bug affects hdd too or is it only for ssd?

Not sure. autodefrag should affect HDD too, but some people are reporting it only happens with autodefrag,ssd .

If you were seeing it on an HDD you'd notice a lot of drive churn.

5 Likes