Btrfs failure

I cannot boot anymore. There is a BTRS failure on my LUKS container on boot.

I'm booted now on a USB disk with another arch based distro and would like to repair my disk.

Here is the message I get if I try to mount the LUKS volume :slight_smile:
The disk in problem is on /dev/nvme0n1p2 and cryptomounted at /mnt/garuda.

Is there anything I can do to repair my btrfs filesystem ????

[madbox ~]# btrfsck --init-extent-tree  /dev/mapper/garuda
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
parent transid verify failed on 1608613888 wanted 131659 found 132173
parent transid verify failed on 1608613888 wanted 131659 found 132173
Ignoring transid failure
ERROR: root [4 0] level 0 does not match 1

Couldn't setup device tree
ERROR: cannot open file system

I can't boot, so I cannot show inxi from garuda ... but here is what I get from the usb disk I booted :slight_smile:


 [madbox ~]# inxi -Faz
System:
  Kernel: 5.17.1-3-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 11.2.0
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-5.17-x86_64
    root=UUID=8c2744e5-7802-4252-b6a2-05dea64b6e7d rw rootflags=subvol=@
    quiet
    cryptdevice=UUID=95bb221b-0c95-4f2d-9bf1-086d0a14acea:luks-95bb221b-0c95-4f2d-9bf1-086d0a14acea
    root=/dev/mapper/luks-95bb221b-0c95-4f2d-9bf1-086d0a14acea
    resume=/dev/mapper/luks-68a4eb59-cf64-411a-9acc-6fbd476a5ead
    udev.log_priority=3 pcie_aspm=off
  Desktop: Openbox v: 3.6.1 info: tint2 dm: LightDM v: 1.30.0
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Laptop System: ASUSTeK product: VivoBook_ASUSLaptop X513UA_M513UA
    v: 1.0 serial: <filter>
  Mobo: ASUSTeK model: X513UA v: 1.0 serial: <filter>
    UEFI: American Megatrends LLC. v: X513UA.305 date: 03/12/2021
Battery:
  ID-1: BAT0 charge: 38.8 Wh (100.0%) condition: 38.8/42.1 Wh (92.2%)
    volts: 11.8 min: 11.8 model: ASUSTeK ASUS Battery type: Li-ion serial: N/A
    status: not charging cycles: 22
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse M325
    serial: <filter> charge: 100% (should be ignored) rechargeable: yes
    status: discharging
CPU:
  Info: model: AMD Ryzen 7 5700U with Radeon Graphics socket: FP6 bits: 64
    type: MT MCP arch: Zen 2 family: 0x17 (23) model-id: 0x68 (104) stepping: 1
    microcode: 0x8608103
  Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
    L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
    L3: 8 MiB desc: 2x4 MiB
  Speed (MHz): avg: 1396 high: 1397 min/max: 400/4372 boost: enabled
    base/boost: 1800/4350 scaling: driver: amd-pstate governor: schedutil
    volts: 1.2 V ext-clock: 100 MHz cores: 1: 1397 2: 1397 3: 1397 4: 1397
    5: 1397 6: 1397 7: 1397 8: 1397 9: 1397 10: 1397 11: 1397 12: 1397
    13: 1397 14: 1397 15: 1397 16: 1396 bogomips: 57504
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW,
    STIBP: conditional, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Lucienne vendor: ASUSTeK driver: amdgpu v: kernel pcie:
    gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4 speed: 16 GT/s ports:
    active: eDP-1 empty: HDMI-A-1 bus-ID: 03:00.0 chip-ID: 1002:164c
    class-ID: 0300
  Device-2: Quanta USB2.0 HD UVC WebCam type: USB driver: uvcvideo
    bus-ID: 3-3:3 chip-ID: 0408:30d4 class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 1.21.1.3 compositor: Picom v: git-7e568
    driver: X: loaded: amdgpu unloaded: fbdev,modesetting alternate: vesa
    gpu: amdgpu display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: eDP-1 mapped: eDP model: Najing CEC Panda 0x0046 built: 2018
    res: 1920x1080 hz: 60 dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64")
    diag: 395mm (15.5") ratio: 16:9 modes: max: 1920x1080 min: 640x480
  Message: Unable to show GL data. Required tool glxinfo missing.
Audio:
  Device-1: AMD Renoir Radeon High Definition Audio driver: snd_hda_intel
    v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4
    speed: 16 GT/s bus-ID: 03:00.1 chip-ID: 1002:1637 class-ID: 0403
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor vendor: ASUSTeK
    driver: N/A alternate: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x,
    snd_sof_amd_renoir
    pcie: gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4 speed: 16 GT/s
    bus-ID: 03:00.5 chip-ID: 1022:15e2 class-ID: 0480
  Device-3: AMD Family 17h/19h HD Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    link-max: gen: 4 speed: 16 GT/s bus-ID: 03:00.6 chip-ID: 1022:15e3
    class-ID: 0403
  Sound Server-1: ALSA v: k5.17.1-3-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.20 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
Network:
  Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel pcie: gen: 2
    speed: 5 GT/s lanes: 1 bus-ID: 01:00.0 chip-ID: 8086:2723 class-ID: 0280
  IF: wlp1s0 state: up mac: <filter>
Bluetooth:
  Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8
    bus-ID: 3-2:5 chip-ID: 8087:0029 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 4 state: up address: see --recommends
Drives:
  Local Storage: total: 1.05 TiB used: 7.06 GiB (0.7%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital
    model: PC SN530 SDBPNPZ-1T00-1002 size: 953.87 GiB block-size:
    physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD
    serial: <filter> rev: 21106000 temp: 29.9 C scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 type: USB vendor: Samsung model: Flash Drive
    size: 119.5 GiB block-size: physical: 512 B logical: 512 B type: SSD
    serial: <filter> rev: 1100 scheme: GPT
Partition:
  ID-1: / raw-size: 102.59 GiB size: 102.59 GiB (100.00%)
    used: 7.06 GiB (6.9%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
    maj-min: 254:0 mapped: luks-95bb221b-0c95-4f2d-9bf1-086d0a14acea
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 712 KiB (0.2%) fs: vfat block-size: 512 B dev: /dev/sda1 maj-min: 8:1
  ID-3: /home raw-size: 102.59 GiB size: 102.59 GiB (100.00%)
    used: 7.06 GiB (6.9%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
    maj-min: 254:0 mapped: luks-95bb221b-0c95-4f2d-9bf1-086d0a14acea
  ID-4: /var/log raw-size: 102.59 GiB size: 102.59 GiB (100.00%)
    used: 7.06 GiB (6.9%) fs: btrfs block-size: 4096 B dev: /dev/dm-0
    maj-min: 254:0 mapped: luks-95bb221b-0c95-4f2d-9bf1-086d0a14acea
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 16.61 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/dm-1 maj-min: 254:1
    mapped: luks-68a4eb59-cf64-411a-9acc-6fbd476a5ead
Sensors:
  System Temperatures: cpu: 43.0 C mobo: N/A gpu: amdgpu temp: 40.0 C
  Fan Speeds (RPM): cpu: 0
Info:
  Processes: 355 Uptime: 12h 14m wakeups: 13 Memory: 15.1 GiB
  used: 2.14 GiB (14.2%) Init: systemd v: 250 tool: systemctl Compilers:
  gcc: 11.2.0 Packages: pacman: 904 lib: 198 Shell: Bash (sudo) v: 5.1.16
  running-in: terminator inxi: 3.3.15

Anything particular that happened just before it stopped working?
Or from the log files? If you can still access them...

No I cannot mount anymore the volume.
I did an garuda-update yesterday and after at boot it gave the same error message :

Btrfs error parent transid verify failed
open_ctree failed

and then went in busybox .

Oh crap!! :frowning: That's major.

Maybe someone is aware of recent package changes (btrfs, crypto, ...) that may cause this?
Or bugs on Arch?

1 Like

I know this is of little use for you, but I thought I'd add a comment. To the best of my knowledge none of the devs use encryption on their systems. Therefore, you are not likely to receive expert assistance from those involved with the project.

While the use of encryption is more commonplace these days it is a niche that requires much skill and knowledge to fix when a breakage occurs.

I personally stopped using encryption years ago when I suffered my first data loss from its usage. While the privacy of your data is important, it needs to be weighed against the risk of data loss through an unrecoverable breakage. Encrypting your data while more secure, makes the likelihood of successful data recovery much slimmer.

I see from your past post history that this is not your first boot breakage when using encryption. Perhaps you may want to consider simply encrypting your sensitive data rather than using full disk encryption, as this is likely to be an ongoing headache in the long term.

I wish you luck, and I hope someone with experience in this field can render assistance to help get your system working again.

4 Likes

On the #btrfs channel on libera chat, they told me someone (Josef) was working on a tool that could help but it is not ready.
It seems that the drive dropped some data and COW filesystems are very sensitive to that.
All that is on the drive seems to be lost : 10 subvolumes, 50 or more snapshots and my data.
But on the happier side I just took a backup the day before of all my data. Damage to personal data is null.

So all the fuss about having snapshots is just hot air ... If you lose the disk you lose everything and you better have backups!

Anybody has a disk setup suggestion that would avoid that ? I had all my OS an data on different subvolumes but on the same btrfs FS. I will make that different. OpenSUSE put the data on a separate XFS filesystem. Any suggestions ?

Here is more details about the btrfs transid verification failed :

btrfs transid verify failed

(sorry for wrong link)

I personally only use btrfs on my system disk (small SSD). I store all my data on large platter drives externally. I simply symlink all my data into my home directory on my SSD.

On all my external drives I use the ext4 file system. I use ext4 on the storage drives because data recovery software is far better supported on ext4.

Yes, I do keep redundant backups of my data, but at least with ext4 in an emergency, file recovery is an option. Snapshots are not valid backups, as you know all too well.

I use borgbackup to save my personal data on an external disk. That external disk has a system76 PopOS on it. And the disk is LUKS too but with LVM and EXT4 as the filesystem.

So nothing is lost from personnal data. But I have done a lot of configuration on systemd, kde, ssh, etc. I boot with systemd-boot and keep snapshots with snapper-systemd-boot. I have backups of /etc too. But It's still a lot of work to rescue all this and be back as yesterday.

But it's life!

1 Like

I use small SSD's that contain only the OS in an internal hot swap rack. I can swap OS's in the time it akes to reboot without the need for a boot manager. I can image one drive to another in the icydock rack and then swap drives. Backed up drive is now the boot drive, and the original is now the backup.

2 Likes

The correct link
Parent_Transid_Verify_Failed

1 Like

I tried to rescue disk with btrfs rescue / btrfs-find-root , but nothing seem to work. I can't find any block that would point to a better set of trees. Looks it will be a full reinstall plus the restore of personal data.

       :face_vomiting:

I had a system freeze that I hard rebooted (didn't know about magic SysRq could have prevented the corruption probably) during a btrfs balance that corrupted one of my drives and nothing would work to fix it. I was able to do a recovery mount to copy data off the drive though to avoid real data loss.

mount  -t btrfs -o recovery,ro

Not at all familiar with LUKS, though... so not sure if that's even a possibility in that situation.

1 Like

Perfect. Really perfect. You may consider taking the route of reinstalling and replacing your data.

Snapshots are not backups. They aren’t there to help you recover in the event of hardware or fs failure. Their purpose is to access data in a previous state. Snapshots literally point to the same copy of the data so even if the meta doesn’t corrupted and the fs is still intact, you can still have data loss.

However, you can use snapshots as part of your backup strategy by replicating them to another disk/device. Then you will have both history and a second copy of the data elsewhere.

It should work, just unlock the luks volume first and use the unlocked luks volume as the device.

1 Like

17 posts were merged into an existing topic: BTRFS pros and cons debate

Did you use btrfs defrag before this crash?


Can you check if this SSD is ok?

  • sudo smartctl -a /dev/nvmeX
  • sudo badblocks -sv /dev/nvmeX
  • sudo nvme smart-log /dev/nvmeX
1 Like

Bad RAM can also mimic a bad drive, so be be sure to test your RAM to make sure it is not causing the errors on your system.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.