Some caching setting amiss - Freeze/lock during copy files from disk

zensin · 14 May 2021 18:23

This is a common Linux problem that I didn't expect to see in Garuda since it's half-geared towards performance. While copying a large amount of data/files drive-to-drive, dolphin copies a batch of files, KDE freezes, and then another batch is processed. This happens periodically until all data is copied. To make this matter worse, baloo will be indexing after each batch, compounding the problem.

I love Garuda so far but this is not the way. I'm not very familiar with the intricacies of ramdisk but this file copying problem has common (Not accounting for ramdisk) solutions.

Upon further inspection, Garuda's defaults are...

/proc/sys/vm/dirty_background_bytes == 0

/proc/sys/vm/dirty_bytes == 0

/proc/sys/vm/dirty_background_ratio == 20

/proc/sys/vm/dirty_ratio == 50

/proc/sys/vm/dirty_expire_centisecs = 3000

Are these good for 64GB of RAM and combition of SSD + HDD? Probably not. My drives are:

/dev/nvme1n1 - (System drive NVMe SSD - luks/btrfs by Garuda default)
/dev/nvme0n1 - (Backup drive NVMe SSD - veracrypt AES/ext4)
/dev/sda1 - (Backup drive HDD 7200RPM - veracrypt AES/ext4)

Unfortunately, I don't know how to replicate the success of my previous install. Even if I set sysctl.conf to what it was in my last install, Garuda still suffers the same problem. Trying to restore my backups has been hell. I have tried a few different dirty cache settings:

###################################################################
# Magic system request Key
# 0=disable, 1=enable all
# Debian kernels have this set to 0 (disable the key)
# See https://www.kernel.org/doc/Documentation/sysrq.txt
# for what other values do
kernel.sysrq=1

#vm.dirty_background_bytes = 107374182
#vm.dirty_bytes = 214748365
#vm.dirty_background_ratio = 0
#vm.dirty_ratio = 0

# Settings from: https://gpdb.docs.pivotal.io/6-0/install_guide/prep_os.html#topic3__sysctl_file
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100

# Settings from: https://gpdb.docs.pivotal.io/6-0/install_guide/prep_os.html#topic3__system_memory

# For systems with 64GB of memory or less:
#vm.dirty_background_ratio = 3
#vm.dirty_ratio = 10

# For systems with more than 64GB of memory:
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5GB
vm.dirty_bytes = 4294967296 # 4GB

# From previous install
#vm.dirty_bytes = 50331648
#vm.dirty_background_bytes = 16777216
# Arangodb
vm.max_map_count = 768000

The uncommented lines are from my last attempt. It seems like setting the sync limits low or high doesn't solve the problem. The system will lock during transfers. The system also lags periodically during normal operation, presumably when the write cache fills and is being emptied.

2 outputs from watch -n 1 cat /proc/meminfo during a transfer:

MemTotal:       65763072 kB
MemFree:          562692 kB
MemAvailable:   51434532 kB
Buffers:           20248 kB
Cached:         48494444 kB
SwapCached:         5272 kB
Active:         12441324 kB
Inactive:       46240220 kB
Active(anon):    8281572 kB
Inactive(anon):  2439412 kB
Active(file):    4159752 kB
Inactive(file): 43800808 kB
Unevictable:      276332 kB
Mlocked:          276332 kB
SwapTotal:      65763024 kB
SwapFree:       64266448 kB
Dirty:             23024 kB
Writeback:        259688 kB
AnonPages:      10082212 kB
Mapped:          2617760 kB
#################################
#################################
MemTotal:       65763072 kB
MemFree:          985444 kB
MemAvailable:   51357120 kB
Buffers:           20248 kB
Cached:         47983364 kB
SwapCached:         5732 kB
Active:         12514040 kB
Inactive:       45547992 kB
Active(anon):    8253760 kB
Inactive(anon):  2360672 kB
Active(file):    4260280 kB
Inactive(file): 43187320 kB
Unevictable:      276400 kB
Mlocked:          276400 kB
SwapTotal:      65763024 kB
SwapFree:       64158160 kB
Dirty:             18784 kB
Writeback:        352320 kB
AnonPages:       9974920 kB
Mapped:          2601732 kB

What can I do make this system stable?

Some links I referenced before posting:

SGS · 14 May 2021 18:28

Open issue on KDE/Dolphin, please

and

zensin · 14 May 2021 18:29

This same issue happened on Gnome. I installed Gnome first (I prefer it) and then installed KDE, hoping it was just some Gnome issue.

SGS · 14 May 2021 18:31

I use i3wm and KDE and have no problems.

As always post your inxi like in

dr460nf1r3 · 14 May 2021 18:33

Quick question, did you try transferring using cp or rsync instead of a file manager?

zensin · 14 May 2021 19:19

Here’s the output:

(Request again if needed and i’ll post on secure bin)

I have run another test. I formatted a new flash drive to btrfs and mounted with: group,defaults,nofail,compress=zstd,noatime. I then copied a directory of many files (Large and small) to it. Then, I tried copying them back to the system drive and part way through, the system locked again until the cache was synced to disk.

I just ran another test using cp. Immediately after running it, cp returned (Did not wait for files to be copied). A few seconds later, KDE locked and I could not switch to a new TTY. I had to wait for the files to copy and KDE resumed (dmesg crashed along the way).

petsam · 14 May 2021 20:19

Obviously you are trolling us. You should add “trolling” in the topic title, so we don’t misunderstand.

Thanks for your time posting.

The best place to post your suggestions is Garuda Gitlab (relevant source code for modifications.

If you think you are not trolling, do this test, please:
Install Archlinux in real HW and do the same copy activities/testing as you have posted above.
Report your comparison to what was in Garuda.
I trust you are not going to alter the results. You are too professional for that.

zensin · 14 May 2021 20:25

wtf are you talking about? Write sync lockups have been happening probably as long as I've been using Linux (20 years now, especially after higher capacities became cheap) and as opposed to my last install, I have not been able to solve it in Garuda (I want to).

I said "This is not the way" in regards to this very specific issue of disk sync settings. Maybe distro devs are extra-sensitive to such things but idc. I just want to fix it and I copy/pasted the output directly from the tool running on my PC.

Going to Gnome now, again...

petsam · 14 May 2021 20:27

What stops you for fixing it for your system?
After you fix it, we would appreciate your feedback, so we can improve Garuda to a higher level.

jonathon · 14 May 2021 20:28

Just to verify this is not the same BTRFS issue others are seeing, could you try a

for _mount in $(mount | grep btrfs | cut -d' ' -f3); do
    btrfs filesystem balance start -dusage=85 -musage=85 $_mount
done

to run a filtered balance (feel free to adjust etc.) then try again?

The only difference on my more vanilla Arch system is vm.dirty_expire_centisecs = 1500. Also worth trying a

sudo sysctl vm.dirty_expire_centisecs=1500

?

zensin · 14 May 2021 20:34

If I knew how to, I would. Nothing I have tried so far has worked, unfortunately. I’m not sure what “For your system” is meant to imply. That I’m using someone else’s or posting stats from someone else’s? Why would I do that? There’s no time man.

btrfs balance: unknown token '-dusage=85'

sry, this is my first time using btrfs.

jonathon · 14 May 2021 20:35

Ugh, wrote it out incorrectly from memory. Check again, I've edited it.

zensin · 14 May 2021 20:56

Thanks for that. It was running but since it (The btrfs script) was probably going to take a long time and since the locking started again, I killed it. I'm going to go ahead and reinstall Gnome in-case it's something I've been doing without realizing. I'll test again with a 100% fresh install and report back in a bit.

Lag/lock during file transfers wasn't the only reason I switched to KDE. I was also having an NVIDIA issue which, after switching to KDE, I realized was just an irq interrupt I needed to disable (Had happened with other distros but I had forgot about it). It might be that I'm misremembering and the irq problem was the only problem I was having in my gnome install. I sort of doubt it unless Garuda uses different cache settings for KDE and Gnome but we'll see.

IMO it's something about the write cache settings and copying from ext4 HDD to btrfs nvme. That's not entirely unexpected, depending on settings used given the nature & performance differences of the drives. I previously read about this, that it's a shortcoming of Linux kernel which Linus has spoken about in the past.

I just wish I knew why my previous install (I won't mention the distro but it's a very common one) didn't suffer from it and I am now despite having backed up my sysctl.conf. At any rate, I'm sick of said distro and won't go back to it but I digress.

bbiab

jonathon · 14 May 2021 21:01

If it’s taking a long time to run then it’s doing what it’s supposed to be. However, if it’s taking a very long time (like 5+ minutes) on a fresh installation then something is wrong (or you have a massive disk).

Depending on the kernel versions involved it could well be a kernel regression in 5.12, so it’s also worth trying some of the different variants (linux, linux-lts, linux-zen, …). There are a lot of variables here so it’s likely going to take a little while to find out the specific cause.

Bro · 14 May 2021 23:22

I call bullshit on this statement, as well. If this has been happening as long as you’ve been using Linux, you are either:

Using Linux all wrong for the past 20 years, or
A troll, or
You got into yer daddy’s likker cabinet.

system · 28 May 2021 23:22

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.