Temporary system hangs/freezes when updating

Uhm.. am I blind? I don't see any Tweaks . . .

Edit:

Garuda Assistant might be what I am looking for

They are disabled...

1 Like

Sorry

garuda-assistent

image

Yes got it, but they are disabled already.
So I try enabling them I guess :smiley:

1 Like

IDK, my system work fine:

I masked all the services individually with systemd mask command.

1 Like

I think there might be some bigger issue with my system.
While chatting here I had my iotop -aoP -d 0.5 open and check this out

PID  PRIO  USER     DISK READ  DISK WRITE>  SWAPIN     IO    COMMAND
339 be/4 root        116.83 M   1053.38 M  0.00 %  2.39 % [btrfs-cleaner]
340 be/4 root          4.95 M    906.86 M  0.00 %  0.03 % [btrfs-transacti]

Maybe some one could observe iotop -aoP -d 0.5 for some time as well and see if they have the same btrfs-cleaner and btrfs-transacti behavior.

22m34s of iotop monitoring to have a better reference.

sudo dmesg | grep -i btrfs

?

╭─eha@eha in ~ took 3s
╰─λ sudo dmesg | grep -i  btrfs
[    1.848426] Btrfs loaded, crc32c=crc32c-intel, zoned=yes
[    1.848795] BTRFS: device fsid d3c75445-c596-456b-b7f9-a7bc48e9126b devid 1 transid 100874 /dev/nvme0n1p1 scanned by systemd-udevd (282)
[    2.022876] BTRFS info (device nvme0n1p1): disk space caching is enabled
[    2.022878] BTRFS info (device nvme0n1p1): has skinny extents
[    2.045272] BTRFS info (device nvme0n1p1): enabling ssd optimizations
[    2.392116] BTRFS info (device nvme0n1p1): enabling auto defrag
[    2.392121] BTRFS info (device nvme0n1p1): use zstd compression, level 3
[    2.392123] BTRFS info (device nvme0n1p1): disk space caching is enabled
1 Like

Have you done a btrfs scrub or btrfs check?

Running out of ideas other than "it's a kernel bug".

Or, just thinking, maybe it's delayed initialisation that some filesystems do (e.g. ext4 will initialise in the background to finish formatting faster). It shouldn't take that long to initialise 1TB though, should it?

3 Likes

I have this only a short time after system update and it finished very quick. I use SSD.

Nice try :wink:

But I do that as well

╭─eha@eha in ~ took 2ms
[🔴] × sudo btrfs scrub status /dev/nvme0n1p1
UUID:             d3c75445-c596-456b-b7f9-a7bc48e9126b
Scrub started:    Fri Mar  5 23:40:39 2021
Status:           finished
Duration:         0:01:00
Total to scrub:   148.33GiB
Rate:             2.47GiB/s
Error summary:    no errors found

And a btrfs check /dev/nvme0n1p1 I run 1 hour ago in a live session which did not help.

2 Likes

I guess one way to rule out the Garuda tweaks/configuration would be to install a more vanilla Arch-based distro (whether Arch or EndeavourOS) and see if the same thing happens there (your NVMe is large enough to set aside 20GB for testing :wink:).

They have the same kernels and drivers so the base OS is identical; Timeshift and grub-btrfs can be set up in the same way so those can be tested. If the same thing happens then it's the kernel (or your drive firmware, or...).

Just for documentation I add the output of smartctl

╭─eha@eha in ~ 
╰─λ sudo smartctl --all /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.2-zen1-1-zen] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 960 EVO 1TB
Serial Number:                      S3ETNB0J500487Z
Firmware Version:                   2B7QCXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1.000.204.886.016 [1,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      2
NVMe Version:                       1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1.000.204.886.016 [1,00 TB]
Namespace 1 Utilization:            163.318.820.864 [163 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5571b0256e
Local Time is:                      Fri Mar  5 23:49:10 2021 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     79 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +     6.04W       -        -    0  0  0  0        0       0
1 +     5.09W       -        -    1  1  1  1        0       0
2 +     4.08W       -        -    2  2  2  2        0       0
3 -   0.0400W       -        -    3  3  3  3      210    1500
4 -   0.0050W       -        -    4  4  4  4     2200    6000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    74.273.147 [38,0 TB]
Data Units Written:                 101.478.737 [51,9 TB]
Host Read Commands:                 764.750.016
Host Write Commands:                853.338.314
Controller Busy Time:               2.841
Power Cycles:                       1.578
Power On Hours:                     7.701
Unsafe Shutdowns:                   178
Media and Data Integrity Errors:    0
Error Information Log Entries:      2.748
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius
Temperature Sensor 2:               49 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
0       2748     0  0x004e  0x4212  0x028            0     -     -
1       2747     0  0x004c  0x4004  0x02c            0     0     -
2       2746     0  0x004b  0x4004  0x02c            0     0     -
3       2745     0  0x0045  0x4004  0x02c            0     0     -
4       2744     0  0x0044  0x4004  0x02c            0     0     -
5       2743     0  0x0042  0x4004  0x02c            0     0     -
6       2742     0  0x0041  0x4004  0x02c            0     0     -
7       2741     0  0x001f  0x4004  0x02c            0     0     -
8       2740     0  0x001e  0x4004  0x02c            0     0     -
9       2739     0  0x001c  0x4212  0x028            0     -     -
10       2738     0  0x0002  0x4004  0x028            0     0     -
11       2737     0  0x004c  0x4212  0x028            0     -     -
12       2736     0  0x0042  0x4004  0x02c            0     0     -
13       2735     0  0x0041  0x4004  0x02c            0     0     -
14       2734     0  0x001f  0x4004  0x02c            0     0     -
15       2733     0  0x001e  0x4004  0x02c            0     0     -
... (48 entries not read)
1 Like

Oh ya, I also have automatic timeshift updates disabled, which I figured was best with qgroups disabled. if you do that you need to do a sudo update-grub after any manual backup snapshot or it wont get added to your grub advanced menu boot choices. I simply wrote a service to do the update-grub automatically at shutdown. This will delay your shutdown, but I'm not one to care about my shutdown or boot times.

I have read that automatic snapshots can also contribute to this problem, so I just cut the strings on everything mentioned as a possibility with this issue.

Since it got worse I pulled the plug.
I am now sitting in front of a freshly installed Garuda KDE Dragonized Gaming Edition.

Btw @tbg the download for said distro leads here, aka 404.
I got my image via the torrent.

https://sourceforge.net/projects/garuda-linux/files/dr460nized-gaming/210225/garuda-dr460nized-gaming-linux-zen-210225.iso/download


Now after a Data restore I can do a little report.
I can download a game on Steam while typing here or watch a YouTube video.
And yes this was not possible before I trashed my system.
Hurray!

1 Like

I read yesterday(?) that subvolid can be a real buzzkill. Subvol is all that’s needed. Worth researching.

1 Like

Could have been. :thinking:
If you wanna know how I got to such high subvolids check out this topic.

1 Like

Yeah, further and further down the btrfs rabbit hole we go. :sigh:

1 Like

Hm. I just checked my subvolumes

╭─eha@eha in ~ took 9ms
╰─λ sudo btrfs subvolume list /
ID 256 gen 858 top level 5 path @
ID 257 gen 858 top level 5 path @home
ID 258 gen 294 top level 5 path @root
ID 259 gen 57 top level 5 path @srv
ID 260 gen 316 top level 5 path @cache
ID 261 gen 858 top level 5 path @log
ID 262 gen 247 top level 5 path @tmp
ID 282 gen 70 top level 5 path timeshift-btrfs/snapshots/2021-03-06_00-37-17/@

They still have a high ID after a fresh system setup.
I don't wanna go deeper ... :rabbit: :hole:

1 Like

Weirdly,

$ mount | grep btrfs
/dev/sdb4 on / type btrfs (rw,noatime,compress=zstd:3,ssd,space_cache,autodefrag,subvolid=257,subvol=/@)
...

and

$ sudo btrfs subvolume list -p /
ID 257 gen 1718809 parent 5 top level 5 path @
ID 258 gen 1718752 parent 5 top level 5 path @home
ID 264 gen 1613362 parent 257 top level 257 path var/lib/machines
ID 271 gen 1613362 parent 257 top level 257 path var/lib/portables

so… uh… :thinking: