Nvidia 460.39.4 hangs on upgrade

grubup the command, probably led to that umount error. From now on I will use sudo update-grub

Using sudo update-grub also causes the mounting of / to /tmp/tmp.*.

Just tried it.

╰─λ sudo update-grub
Generating grub configuration file ...
Found theme: /usr/share/grub/themes/garuda/theme.txt
Found linux image: /boot/vmlinuz-linux-tkg-bmq
Found initrd image: /boot/amd-ucode.img /boot/initramfs-linux-tkg-bmq.img
Found fallback initrd image(s) in /boot: initramfs-linux-tkg-bmq-fallback.img
Detecting snapshots ...
Info: Separate boot partition not detected 
Found snapshot: 2021-02-02 08:29:38 | @snapshots/autosnap/root2021-02-02_08H29
Found snapshot: 2021-02-02 08:25:57 | @snapshots/@.autocpufreq_2021-02-02_08H25
Found snapshot: 2021-02-01 20:51:52 | @snapshots/autosnap/root2021-02-01_20H51
Found snapshot: 2021-02-01 20:41:43 | @snapshots/autosnap/root2021-02-01_20H41
Found snapshot: 2021-02-01 19:05:01 | @snapshots/daily/root2021-02-01_19H05
Found snapshot: 2021-02-01 15:49:21 | @snapshots/autosnap/root2021-02-01_15H49
Found snapshot: 2021-02-01 15:48:13 | @snapshots/autosnap/root2021-02-01_15H48
Found snapshot: 2021-01-31 19:05:01 | @snapshots/daily/root2021-01-31_19H05
Found snapshot: 2021-01-31 18:12:50 | @snapshots/ad-hoc/@.b4_custom_kernel_2021-01-31_18H12
Found snapshot: 2021-01-31 18:11:13 | @snapshots/ad-hoc/root_upd_2021-01-31_18H11
Found snapshot: 2021-01-30 19:05:01 | @snapshots/daily/root2021-01-30_19H05
Found snapshot: 2021-01-29 19:14:49 | @snapshots/daily/root2021-01-29_19H14
Found snapshot: 2021-01-29 19:13:20 | @snapshots/daily/root2021-01-29_19H13
Found snapshot: 2021-01-29 19:12:33 | @snapshots/daily/root2021-01-29_19H12
Found snapshot: 2021-01-29 19:11:49 | @snapshots/daily/root2021-01-29_19H11
Found snapshot: 2021-01-27 20:29:55 | @snapshots/sync/SSD1/@.old
umount: /tmp/tmp.roCP4liHK6: target is busy.

you need to downgrade first. Look at dr460nf1r3 post and the link there

1 Like

Oh you mean that grubup still causes the issue after a downgrade but sudo update-grub does not?

So, what can happen if you try sudo update-grup :slight_smile:

Do it :smiley:

Really weird issue, cuz grubup is just an alias in fish.

alias grubup="sudo update-grub"

I have a lot to test tonight, I need some alcohol to take me company while I test all that. :smiley: :beers:

LOL, must be a @dr460nf1r3 alias :smiley:

It was there when I did a fresh install so probably someone from Garuda included it in fish. I like it, I used it countless times, but we'll see upon a grub-btrfs downgrade. I'll create my own alias after that downgrade and see if that starts causing the /tmp issue again as well.

1 Like

TEST #1: Downgrading grub-btrfs

I downgraded from 4.8.1.1 to 4.7.2.1 and BOTH alias grubup="sudo update-grub" and sudo update-grub now work.

╭─frank at Garuda in ⌁
╰─λ sudo update-grub
Generating grub configuration file ...
Found theme: /usr/share/grub/themes/garuda/theme.txt
Found linux image: /boot/vmlinuz-linux-tkg-bmq
Found initrd image: /boot/amd-ucode.img /boot/initramfs-linux-tkg-bmq.img
Found fallback initrd image(s) in /boot: initramfs-linux-tkg-bmq-fallback.img
Detecting snapshots ...
Info: Separate boot partition not detected 
Found snapshot: 2021-02-02 09:52:34 | @snapshots/ad-hoc/@.nvidia
Found snapshot: 2021-01-27 20:28:05 | @snapshots/ad-hoc/@.b4_bkp_2021-01-27-20H28
Found snapshot: 2021-01-24 20:42:25 | @snapshots/ad-hoc/@.locale_2021-01-24_20H41
Found snapshot: 2021-01-16 15:52:05 | @.bkp
Found 4 snapshot(s)
done
╭─frank at Garuda in ⌁
╰─λ grubup
Generating grub configuration file ...
Found theme: /usr/share/grub/themes/garuda/theme.txt
Found linux image: /boot/vmlinuz-linux-tkg-bmq
Found initrd image: /boot/amd-ucode.img /boot/initramfs-linux-tkg-bmq.img
Found fallback initrd image(s) in /boot: initramfs-linux-tkg-bmq-fallback.img
Detecting snapshots ...
Info: Separate boot partition not detected 
Found snapshot: 2021-02-02 09:52:34 | @snapshots/ad-hoc/@.nvidia
Found snapshot: 2021-01-27 20:28:05 | @snapshots/ad-hoc/@.b4_bkp_2021-01-27-20H28
Found snapshot: 2021-01-24 20:42:25 | @snapshots/ad-hoc/@.locale_2021-01-24_20H41
Found snapshot: 2021-01-16 15:52:05 | @.bkp
Found 4 snapshot(s)
done

I just saved Dr460nfir3 and his alias. :smiley:

1 Like

To be clear, if someone uses an alias, he must post the command, not the alias :slight_smile:
Sample, I use tsl

alias tsl='sudo timeshift --list'
1 Like

TEST #2.1: Building a kernel using force_all_threads=true with grub-btrfs downgraded and auto-cpufreq masked.

Still massively slow that the *.o take a few secs to move, totally not normal. So massive slow down is not related to grub-btrfs. What? Could have been, we never know! :slight_smile:


TEST #2.2: Building a kernel using force_all_threads=false with grub-btrfs downgraded and auto-cpufreq masked.

That seems to progress as it normaly does. A lot slower than all threads running when they run without slowing down (you get that? loll), but it works so far. It will take 2-2.5h when I don't force all threads, so I will kill the process.

I tried using "-J" parameter but that didn't change anything, how does it work? I have 8 cores and 16 threads, if I want say 80% of my CPU to run for a kernel compile, why would "-J 6" or "-J 12" not work?

Anyway, I can probably build the kernel if I wait 2+h. But that still doesn't fix what is causing that massive slow down on high CPU usage.

Time to test a full reinstall.

TEST #3: Full reinstall of Dr460nized and compiling kernel with all threads and NO config at all in Garuda

AWESOME!!! I see something like 20-40 *.o being compiled per second. And I am writing you right now while the compile process is undergoing for 4mins. Should be so slow that I would not be able to write this, from my other snapshots.

Confirmed: There is something, either a package update, or a config I did, that is causing that massive slow down.

Curiously I was able to start the kernel compile and just download the required packages for the build, without the need for a full upgrade.

Next test is to perform a full upgrade and re-do that kernel compile.

6mins now and I can still VERY much use my machine, even though it’s running 100% CPU. I guess :beers: do help.


→ exit cleanup done

→ compilation time :

real 15m15.191s
user 196m11.233s
sys 21m53.166s

Fastest kernel I have compiled so far. I did only 2 working kernels. lolll
Will it boot? I don’t care that’s not the point.

The point is the “funnel technique” worked once again. I am now down to 2 possibilities: package update OR manual personal Frank configuration in Garuda.

Let’s see if I can upgrade and re-issue a kernel compile. This WILL confirm whether it is a package or my own stupid configs.

2 Likes

NAILED IT!

It's a package update causing the massive slow down.

It's really sad, cuz my custom compiled kernel is VERY fast. I guess I found great options to use, I mean the Garuda Welcome pops-up while the screen behind is black. I mean... and everything is really fast!

Then when I start compiling a new kernel, as soon as it passes Generating the RSA key, it hogs down MASSIVELY.

There is some package(s) update causing this problem.

BTW nvidia 460.39 updated in less than 30sec.

Even with auto-cpufreq disabled and masked I get that slow down...

How can I troubleshoot that? Garuda is SO DARN quick with my custom kernel (well even without), I need to find out what's causing the issue. :frowning: It deserves to stay super quick.

1 Like

There were 323 packages to update (excluding the 2 kernel packages cuz I was using my custom kernel which has the same version as the updated one 5.10.12-116 so it's conflicting upon update I had to remove the Garuda bmq kernel). ((BTW the removal from Garuda Kernel section in KDE System Settings wants to remove ALL kernels with same "similar name" including my custom one even though it has a different package name. But removing the OEM Garuda BMQ kernel from Pamac worked without removing my custom kernel, as it DID remove all kernels on my 1st try using Garuda KDE System Settings feature. DOH!!! lolll no kernel at all in /boot. Do you want me to provide more info on this?))

Is there a way I can "fastly" ignore all of them and then install them one by one?
I will initiate a kernel compile after every single update until I find which one is causing the issue. Could be after 1 package, could be at the very last of the 323 packages... lolllll

Is it possible that the total time spent optimizing the kernel is longer than the time saved by the result?
Doesn’t matter, as long as you enjoy it. :smiley:

But I’m guessing no one on the team is working through the 55 posts here to help you solve it.

Compress your previous result and open a new thread.

Rather unlikely though that you are the first to do a kernel optimization, so you should expand your search to gitlab/github.

Much luck :slight_smile:

1 Like

I don’t think it’s related to kernel optimization. Could be anything maxing out CPU for at least 1-2mins. The problem lies in one (or many) of those 323 packages updates. I have no doubt about that, I revised my funnel technique and I don’t think I forgot a step that would fork me into a different use case (used that technique many times at work and it always worked out to something). Maybe I did, I’m just saying I didn’t find anything wrong in what I tested to isolate as much I as I could the issue.

Could be one of those packages along with my hardware. Yes, true. Good luck to me to find that, absolutely. However I have to, I cannot accept Linux winning over me.

I’ll see what I can do and if I find anything I’ll open a new thread on slow down issues. :slight_smile:

In the meantime yes this long thread is closed and I really appreciate all the ideas you guys threw at me, it got me thinking what to test and that’s when I am able to move on and progress. :slight_smile:

Now if any place in Canada could have a Radeon 5500 XT in stock, I need to change my video card for the past 2 weeks, but everywhere it’s BOed… Who knows, maybe it is related to the latest nvidia driver. lolll

2 Likes

And there you go... darn I am better than I thought...

In order to fix that MASSIVE CPU hogging I had to simply select PCIe3.0 in my BIOS instead of the current 4.0 I had selected.

That being said, it means nvidia DID change something in their recent drivers to use PCIe4.0 differently, as with 460.27 there is absolutely no issue even if I force 4.0 in BIOS. v460.32 seemed to work well too, but 460.39, no go! So by not using the proper BIOS setting, starting at 460.39 you're screwed.

I have a MB that settles PCIe generation for the entire board and not PER SLOT. Since I bought a 4.0 nvme drive, I wanted max power and set up 4.0 in BIOS. My video card is 3.0. I said nah why not, worst case it won't boot.

Since everything was perfectly fine afterwards for over a month, I never thought about that.

Except since the issue started on the upgrade of 460.39, I thought why not isolate that package and see. So this morning I did a full reinstall + full upgrade, BUT using Nouveau drivers (free) and not the nvidia. And my CPU was totally perfect even at 100% usage. So I knew for sure it was "related" to either the driver or a bit further beyond but at least around the video card and not necessarily a bug.

Then I remembered SGS who said "BIOS update". And then I thought before doing a BIOS update, why not selecting 3.0, I know my nvme cannot use 4.0 cuz my video card is 3.0, so why not make my BIOS setting correct before I try an update... and it immediately fixed the issue.

I can yet again re-use Garuda, have a lot of fun compiling kernels and fixing zstd decompressing errors on Pacman upgrades....

CLOSED.

1 Like

There it fits again, the saying, never touch a running system.

You change something, forget it, take days to come back to it and undo it, everything is running.

Then to mark that as your solution, cool.

I'm moving it so no one wastes time with this thread, with 58 posts.

1 Like

This indicates filesystem corruption, quite possibly self-inflicted because

?

1 Like