Btrfs balance crashed now no free space on disk

zany130 · 9 January 2024 23:25

So this is kinda part three of Only able to boot snapshots. Restoring does not fix the issue

I noticed I had less than 100GB of free space so I thought maybe a btrfs balance might help, plus I hadn’t done one yet since I reinstalled.

So I started the balance proccses through the btrfs-assistant and after awhile my system crashed to the login screen ( I have auto login enabled so this was kinda weird)

Now I supposedly have only 1MB of free space on the disk and apps are crashing left and right.

Any ideas? Im thinking I should just switch to another FS but I like snapper…

df
Filesystem      1K-blocks       Used Available Use% Mounted on
/dev/nvme0n1p2  976450784  919055072    251904 100% /
devtmpfs             4096          0      4096   0% /dev
tmpfs            16388136     284728  16103408   2% /dev/shm
efivarfs              128         34        90  28% /sys/firmware/efi/efivars
tmpfs             6555256       2464   6552792   1% /run
/dev/nvme0n1p2  976450784  919055072    251904 100% /home
/dev/nvme0n1p2  976450784  919055072    251904 100% /root
/dev/nvme0n1p2  976450784  919055072    251904 100% /srv
/dev/nvme0n1p2  976450784  919055072    251904 100% /var/cache
/dev/nvme0n1p2  976450784  919055072    251904 100% /var/log
/dev/nvme0n1p2  976450784  919055072    251904 100% /var/tmp
tmpfs            16388136      71312  16316824   1% /tmp
/dev/nvme0n1p1     306584      38296    268288  13% /boot/efi
tmpfs             3277624    2893112    384512  89% /run/user/1000
/dev/sda2      1921666736 1660050240 163924444  92% /mnt/GAMES
pCloud.fs      2147483648 1721078816 426404832  81% /home/zany130/pCloudDrive

garuda-inxi
System:
  Kernel: 6.7.0-1-cachyos arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc available: hpet,acpi_pm
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-linux
    root=UUID=8a69f047-feee-46cf-81a1-d299eb173883 rw rootflags=subvol=@
    rd.udev.log_priority=3 vt.global_cursor_default=0 loglevel=3
    sysrq_always_enabled=1 amdgpu.ppfeaturemask=0xffffffff tsc=reliable
    clocksource=tsc nowatchdog nmi_watchdog=0
    initrd=@\boot\initramfs-linux-cachyos.img
  Desktop: KDE Plasma v: 5.27.10 tk: Qt v: 5.15.12 wm: kwin_wayland vt: 1
    dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop Mobo: ASRock model: X470 Taichi serial: <superuser required>
    UEFI: American Megatrends v: P5.10 date: 10/20/2022
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 3
    serial: <filter> charge: 100% (should be ignored) rechargeable: yes
    status: discharging
CPU:
  Info: model: AMD Ryzen 5 5600X bits: 64 type: MT MCP arch: Zen 3+ gen: 4
    level: v3 note: check built: 2022 process: TSMC n6 (7nm) family: 0x19 (25)
    model-id: 0x21 (33) stepping: 2 microcode: 0xA20120A
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 3 MiB desc: 6x512 KiB
    L3: 32 MiB desc: 1x32 MiB
  Speed (MHz): avg: 3914 high: 4625 min/max: 2200/4686 boost: enabled
    scaling: driver: acpi-cpufreq governor: performance cores: 1: 4625 2: 4079
    3: 3656 4: 3990 5: 3746 6: 3699 7: 4625 8: 3745 9: 3696 10: 3698 11: 3718
    12: 3700 bogomips: 88807
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities: <filter>
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
    vendor: Gigabyte driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
    process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s
    lanes: 16 ports: active: DP-1,DP-2,HDMI-A-2 empty: HDMI-A-1
    bus-ID: 10:00.0 chip-ID: 1002:73df class-ID: 0300
  Display: wayland server: X.org v: 1.21.1.10 with: Xwayland v: 23.2.3
    compositor: kwin_wayland driver: X: loaded: amdgpu
    unloaded: modesetting,radeon alternate: fbdev,vesa dri: radeonsi
    gpu: amdgpu d-rect: 4980x2513 display-ID: 0
  Monitor-1: DP-1 pos: bottom-r res: 2048x864 size: N/A modes: N/A
  Monitor-2: DP-2 pos: primary,top-left res: 1396x785 size: N/A modes: N/A
  Monitor-3: HDMI-A-2 pos: middle-c res: 1536x864 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi
    wayland: drv: radeonsi x11: drv: radeonsi
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 23.3.2-arch1.2
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6700 XT (radeonsi
    navi22 LLVM 16.0.6 DRM 3.56 6.7.0-1-cachyos) device-ID: 1002:73df
    memory: 11.72 GiB unified: no display-ID: :0.0
  API: Vulkan v: 1.3.274 layers: 21 device: 0 type: discrete-gpu name: AMD
    Radeon RX 6700 XT (RADV NAVI22) driver: mesa radv v: 23.3.2-arch1.2
    device-ID: 1002:73df surfaces: xcb,xlib,wayland device: 1 type: cpu
    name: llvmpipe (LLVM 16.0.6 256 bits) driver: mesa llvmpipe
    v: 23.3.2-arch1.2 (LLVM 16.0.6) device-ID: 10005:0000
    surfaces: xcb,xlib,wayland
Audio:
  Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
    gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 10:00.1 chip-ID: 1002:ab28
    class-ID: 0403
  Device-2: Sony INZONE H9 / H7
    driver: cdc_acm,hid-generic,snd-usb-audio,usbhid type: USB rev: 2.0
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 5-3:3 chip-ID: 054c:0e53
    class-ID: 0a00
  API: ALSA v: k6.7.0-1-cachyos status: kernel-api with: aoss
    type: oss-emulator tools: N/A
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: PipeWire v: 1.0.0 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Intel Dual Band Wireless-AC 3168NGW [Stone Peak] driver: iwlwifi
    v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 bus-ID: 08:00.0
    chip-ID: 8086:24fb class-ID: 0280
  IF: wlp8s0 state: down mac: <filter>
  Device-2: Intel I211 Gigabit Network vendor: ASRock driver: igb v: kernel
    pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: d000 bus-ID: 0a:00.0
    chip-ID: 8086:1539 class-ID: 0200
  IF: enp10s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Edimax Bluetooth Adapter driver: btusb v: 0.8 type: USB rev: 1.1
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-5:3 chip-ID: 7392:c611
    class-ID: e001 serial: <filter>
  Report: btmgmt ID: hci0 rfk-id: 0 state: up address: <filter> bt-v: 5.1
    lmp-v: 10 status: discoverable: yes pairing: yes class-ID: 7c0104
Drives:
  Local Storage: total: 3.64 TiB used: 4.01 TiB (110.1%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
    model: WD BLACK SN770 1TB size: 931.51 GiB block-size: physical: 512 B
    logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 731120WD temp: 38.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:3 vendor: Samsung model: SSD 980 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 3B4QFXO7 temp: 38.9 C
    scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: Seagate model: ST2000DX002-2DV164
    size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    tech: HDD rpm: 7200 serial: <filter> fw-rev: CC41 scheme: GPT
Partition:
  ID-1: / raw-size: 931.22 GiB size: 931.22 GiB (100.00%)
    used: 876.74 GiB (94.1%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 37.4 MiB (12.5%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 931.22 GiB size: 931.22 GiB (100.00%)
    used: 876.74 GiB (94.1%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-4: /var/log raw-size: 931.22 GiB size: 931.22 GiB (100.00%)
    used: 876.74 GiB (94.1%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-5: /var/tmp raw-size: 931.22 GiB size: 931.22 GiB (100.00%)
    used: 876.74 GiB (94.1%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default) zswap: no
  ID-1: swap-1 type: zram size: 31.26 GiB used: 0 KiB (0.0%) priority: 100
    comp: zstd avail: lzo,lzo-rle,lz4,lz4hc,842 max-streams: 12 dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 54.9 C mobo: 37.0 C gpu: amdgpu temp: 52.0 C
    mem: 54.0 C
  Fan Speeds (rpm): fan-1: 0 fan-2: 2023 fan-3: 0 fan-4: 0 fan-5: 0
    gpu: amdgpu fan: 0
  Power: 12v: N/A 5v: N/A 3.3v: 3.20 vbat: 3.28 gpu: amdgpu watts: 27.00
Info:
  Processes: 569 Uptime: 2m wakeups: 0 Memory: total: 32 GiB
  available: 31.26 GiB used: 13.21 GiB (42.3%) Init: systemd v: 255
  default: graphical tool: systemctl Compilers: gcc: 13.2.1 clang: 16.0.6
  Packages: 2785 pm: pacman pkgs: 2742 libs: 618 tools: octopi,paru
  pm: appimage pkgs: 0 pm: flatpak pkgs: 43 Shell: fish v: 3.7.0
  running-in: alacritty inxi: 3.3.31
Garuda (2.6.22-1):
  System install date:     2024-01-05
  Last full system update: 2024-01-08
  Is partially upgraded:   No
  Relevant software:       snapper NetworkManager dracut
  Windows dual boot:       Probably (Run as root to verify)
  Failed units:

BluishHumility · 10 January 2024 03:14

If you have Btrfs snapshots that have captured major changes to the disk (adding or removing a lot of packages for example), those snapshots can potentially tie up a good amount of disk space. Try deleting some old snapshots, then run the balance operation again.

zany130 · 11 January 2024 00:08

I didn’t realize that (I thought snapshots came at very little disk cost) I deleted all my snapshots, and that cleared up most of my drive. Created a manual snapshot so I can still go back if needed

billy.spiva · 11 January 2024 02:54

Correct me if I’m wrong, but isn’t it a default setting that anything over ten snapshots gets auto deleted?

BluishHumility · 11 January 2024 03:33

There is a systemd unit called snapper-cleanup.service which is set to run every day by default. When it runs, it will delete any snapshots that exceed the thresholds defined in the Snapper configs. These thresholds can be set to anything you want, and thresholds can be defined for timeline snapshots as well (to retain a certain number of hourly, daily, weekly, and so on).

Garuda ships with one Snapper config already set up (for the root subvolume) which has this threshold set to ten total snapshots: snapper-support/snapper-template-garuda · main · Garuda Linux 🦅 / PKGBUILDs · GitLab

So, once per day when this service runs if you have more than ten snapshots in this subvolume it will delete the oldest ones until there are only ten left.

The thing is, the number of snapshots you are retaining does not define how much space they occupy. You can have a thousand snapshots that take up no disk space, or you can have one snapshot that fills up the whole disk. What determines how much space a snapshot takes up is how much the snapshot deviates from the current state of the subvolume.

When you first take a snapshot of a subvolume, that snapshot does not take up any disk space. It is not a copy of the disk or anything, it is more a way of telling Btrfs “Hey, remember this exact state the subvolume is in right now.” Moving forward from that snapshot, Btrfs will keep track of any changes that happen. At the moment you first take a snapshot, nothing has changed on the disk yet so there is no extra stuff to “remember”.

If you then delete 100GB of stuff off of the disk, that snapshot will “remember” that 100GB. Even though you deleted that stuff, the disk space won’t actually be released until the snapshot is deleted.

All that to say: the number of snapshots you are retaining does not determine how much disk space the snapshots are taking up. Even if you only have ten snapshots, if they represent major changes to the disk they can potentially occupy a lot of space.

billy.spiva · 11 January 2024 11:21

Thank you, that is very helpful to know.

zany130 · 11 January 2024 15:29

Thank you so much @BluishHumility it makes sense now.

One thing I am confused on is why a balance can cause you to LOSE disk space. From my understanding balance can help reclaim lost disk space when it is fragmented (which can happen when deleting a lot of data eg snapshots that held lot of diff data).

But it shouldn’t cause you to lose space correct?

As soon as I ran the balance operation I noticed I started losing tens of GB in disk space and it keeps going down.

Originaly I had 160 GB free space from 1TB and right now a few minutes into the balance I am at 45.44 GB

BluishHumility · 11 January 2024 15:45

You lost space because the operation did not finish.

During a balance operation, it is normal to see an increase in disk usage temporarily. This happens because when Btrfs balances data, it first reads the existing blocks and then writes new copies of them to different locations on the storage device. The original blocks are not immediately deleted, which can cause an increase in disk usage until the balance operation is complete and the old blocks are removed.

Additionally, during a balance operation, Btrfs may also allocate extra metadata blocks for bookkeeping purposes. These blocks will be reclaimed once the balance operation is finished.

If the balance operation successfully completed, most likely it would have freed up space like you were expecting.

zany130 · 11 January 2024 16:31

But it failed because I ran out of space. It happened again. The moment my disk ran out of space, everything crashed. I just left the balance running and kept an eye on it and my disk space

Do i need a certain amount of free space before being able to do the balance?

most of my space usage is in steam games on my home directory

meanruse · 11 January 2024 16:41

I’d try moving some big files out of the way on some other storage (another disk or partition).
Since you are operating in critical conditions, do it from command line, and rather than moving (mv) first copy (cp) the files elsewhere then try deleting (rm) them. If deleting does not work because the disk is full (silly as it sounds I think that can happen) see if you can truncate them instead with : >/path/to/huge/file (works in bash and fish alike).
Of course put them back in place when done with the balance.

Botfiddler · 11 January 2024 16:50

I had similar problems, no start after my disk was full and before a crash after garuda-update filled it up.

This helped:
sudo paccache -r

How to delete old packages installed by pacman in Arch Linux? - Unix & Linux Stack Exchange

Might also be useful:
sudo systemctl start snapper-cleanup.service

via https://www.reddit.com/r/archlinux/comments/z4r4u4/snapper_not_deleting_old_snapshots/

And some more (which I didn’t try yet):

How to remove orphaned unused packages in Arch Linux - nixCraft

SGS · 11 January 2024 17:35

In Btrfs-assistant, what are your settings in Btrfs maintenance?
And snapper settings?

zany130 · 11 January 2024 17:40

should be default
frequency is monthly for balance and scrub

and mount points for balance are / and /root
https://i.imgur.com/rVgkRXx.png

snapper settings
https://i.imgur.com/DWb7RBw.png

zany130 · 11 January 2024 21:57

So I manged to free up about 400GB by deleting my steam library and that allowed me to finish the balance operation however now I only have 80 GB free and can’t reinstall my steam games

https://i.imgur.com/6FlfANA.png

EDIT:
Apparently, I needed to run another balance afterward. that seems to have gotten me my disk space back…

EDIT2:
I think I still might have some lost disk space?
tried this

sudo btrfs filesystem usage /
Overall:
    Device size:		 931.22GiB
    Device allocated:		 794.06GiB
    Device unallocated:		 137.15GiB
    Device missing:		     0.00B
    Device slack:		   512.00B
    Used:			 559.90GiB
    Free (estimated):		 172.95GiB	(min: 104.37GiB)
    Free (statfs, df):		 172.95GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:588.00GiB, Used:552.21GiB (93.91%)
   /dev/nvme0n1p2	 588.00GiB

Metadata,DUP: Size:103.00GiB, Used:3.84GiB (3.73%)
   /dev/nvme0n1p2	 206.00GiB

System,DUP: Size:32.00MiB, Used:112.00KiB (0.34%)
   /dev/nvme0n1p2	  64.00MiB

Unallocated:
   /dev/nvme0n1p2	 137.15GiB

which I found out about from here Btrfs and balance - #3 by anon44840303

zany130 · 12 January 2024 17:59

Deleted all my snapshots and did another balance and now I lost all my free space
https://i.imgur.com/SFcdmQA.png

I think it’s safe to say this is not normal and something is wrong with my install

Balance did finish , didn’t crash or anything. so btrfs balance is eating my disk space

BluishHumility · 12 January 2024 18:53

I agree that something funny is going on here.

Are you sure? Let’s take a look:

sudo btrfs subvolume list /

Are you sure? Let’s take a look:

sudo btrfs balance status /

zany130 · 12 January 2024 19:19

sudo btrfs subvolume list /
ID 257 gen 1478390 top level 5 path @home
ID 258 gen 1478390 top level 5 path @root
ID 259 gen 1478316 top level 5 path @srv
ID 260 gen 1478390 top level 5 path @cache
ID 261 gen 1478390 top level 5 path @log
ID 262 gen 1478347 top level 5 path @tmp
ID 263 gen 1478390 top level 508 path .snapshots
ID 508 gen 1478390 top level 5 path @

 ╭─zany130@Garuda in ~ via  v3.11.6 as 🧙 took 16ms
 ╰─λ sudo btrfs balance status /
No balance found on '

yup no idea whats going on

btrfs filesystem usage
shows ( I deleted a ~50GB cache file from pcloud which I didn’t need guess thats why it went up)

sudo btrfs filesystem usage /
Overall:
    Device size:		 931.22GiB
    Device allocated:		 868.10GiB
    Device unallocated:		  63.12GiB
    Device missing:		     0.00B
    Device slack:		   512.00B
    Used:			 556.48GiB
    Free (estimated):		  65.10GiB	(min: 33.54GiB)
    Free (statfs, df):		  65.10GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:551.00GiB, Used:549.01GiB (99.64%)
   /dev/nvme0n1p2	 551.00GiB

Metadata,DUP: Size:158.52GiB, Used:3.73GiB (2.35%)
   /dev/nvme0n1p2	 317.04GiB

System,DUP: Size:32.00MiB, Used:112.00KiB (0.34%)
   /dev/nvme0n1p2	  64.00MiB

Unallocated:
   /dev/nvme0n1p2	  63.12GiB

BluishHumility · 12 January 2024 20:13

This part of the output stands out to me. For reference, here is the same output on my disk which is also a 1TB nvme drive:

Metadata,DUP: Size:5.00GiB, Used:4.43GiB (88.59%)
   /dev/nvme0n1p2	 10.00GiB

I actually have more of this metadata than you at the moment, but my system is only setting aside 10 GiB for it while yours is holding on to a whopping 317 GiB.

This might be a long shot, but try shrinking down the filesystem by 1 GB like this:

sudo btrfs filesystem resize -1G /

Then expand it back to full size:

sudo btrfs filesystem resize max /

Do another balance operation when it finishes, then let’s take one more look at the output of sudo btrfs filesystem usage /.

zany130 · 12 January 2024 21:06

slightly better

 sudo btrfs subvolume list /
ID 257 gen 1484345 top level 5 path @home
ID 258 gen 1482498 top level 5 path @root
ID 259 gen 1483729 top level 5 path @srv
ID 260 gen 1484336 top level 5 path @cache
ID 261 gen 1482498 top level 5 path @log
ID 262 gen 1482118 top level 5 path @tmp
ID 263 gen 1482429 top level 508 path .snapshots
ID 508 gen 1484345 top level 5 path @

 ╭─zany130@Garuda in ~ via  v3.11.6 as 🧙 took 15ms
 ╰─λ sudo btrfs balance status /
No balance found on '/'

 ╭─zany130@Garuda in ~ via  v3.11.6 as 🧙 took 16ms
 ╰─λ sudo btrfs filesystem usage /
Overall:
    Device size:		 931.22GiB
    Device allocated:		 807.06GiB
    Device unallocated:		 124.15GiB
    Device missing:		     0.00B
    Device slack:		   512.00B
    Used:			 556.52GiB
    Free (estimated):		 126.11GiB	(min: 64.03GiB)
    Free (statfs, df):		 126.11GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:551.00GiB, Used:549.05GiB (99.65%)
   /dev/nvme0n1p2	 551.00GiB

Metadata,DUP: Size:128.00GiB, Used:3.73GiB (2.92%)
   /dev/nvme0n1p2	 256.00GiB

System,DUP: Size:32.00MiB, Used:112.00KiB (0.34%)
   /dev/nvme0n1p2	  64.00MiB

Unallocated:
   /dev/nvme0n1p2	 124.15GiB

zany130 · 15 January 2024 00:02

still haven’t been able to solve this and just ran out of disk space again… I think should just reinstall and use a different filesystem that doesn’t eat 300GB of my disk space