Finally got to update my bios to the last non-beta version; after updating once again I rebooted right away and got hanged just before the login screen again. This is probably a different problem, so apologies for invading this post (got hooked on some minor coincidences). I will make a post of my own to tackle this once I get to properly document the problem.
didn’t help, I already had 6.17.6, today’s 6.17.7 didn’t solve it as well. It improved a little - there are moments when CPU goes to 5% but then it’s back to 30% and the cooling fan goes like crazy (GPU above 100%). I have waves of good and bad behavior. That’s not good. garuda-diag:
A little investigation result - it behaves like this only on X11, and only if nvidia is set as the primary GPU - as it was before with no issues - (I have a hybrid system - integrated AMD Radeon with nvidia 2050 discrete).
On wayland there’s no high CPU and GPU usage, seems like before (meaning that external monitor is a little laggy that even when I play chess online I have lags in moving the pieces, also 3d games are not an option ; ) But effects like window wobbling work). On X11 - all effects like window wobbling are not working (if nvidia is primary), just moving a window takes time. When nvidia is not set as primary on X11 - wobbling works, everything looks smoother but nvidia is almost not used so it’s a little laggy (tasks go to AMD).
The main issue I have is that because I tried to update / install some additional tools I lost my snapshot that had no lags/high cpu/gpu usage
on wayland after some time I notice small very small lags - CPU goes sometimes to 20% and GPU to 50%, it’s not as big as on X11 but I think it’s still there
The ONLY supported way to use two GPUs on Garuda Linux is NVIDIA prime or DRI_PRIME.
So the answer for you here might be:
Patient: It hurts when I do this
Doctor: Then… Don’t do that.
From the conversation so far, it’s hard to tell how you might have switched to the NVIDIA card as your “primary” GPU. It’s impossible to know what other damage that tool or method might have caused. I can only advise you to try and revert any changes that might have been made to the best of your ability.
20-nvidia-as-primary.conf
Sound familiar?
From my best guess here, it seems that there are quite the number of OpenGL errors related to basic things like creating framebuffers in your diag output.
I would advise sticking to the default Garuda Linux NVIDIA support. This is Linux, you can do whatever you want, but there’s a reason things are done the way they are usually. Some things are battle hardened.
If you continue to experience problems, it might be necessary to perform a reset of Garuda Linux. The script for this is brand new and experimental and I would recommend having at least 20 gigabytes of free storage space and an install USB ready just in case it goes sideways, but it usually works fine:
This is how I use my NVIDIA card, through prime-run prefixes on the command.
Interestingly, even though I’m following “best practice” for dual GPU architecture, I do get very consistently, the following errors in my journal:
20:49:31.547 UTC user@1000.service Invalid framebuffer status: "GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT"
20:55:23.314 UTC user@1000.service Invalid framebuffer status: "GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT"
Weirdly, these only appear when running chromium based browsers (currently Helium AppImage) and I don’t have more than 12-20 tabs open at a time. What will happen, is that something - I don’t know what - would trigger a full browser and Plasma freeze for at least 20-30 seconds at a time, and then it would just resume from where it left off. If I was mid-typing something and it froze while I was still typing, it will complete whatever was in the buffer when it comes back to life. I have no idea how else to trace or monitor what might be triggering it, but I have narrowed down some variables to: Wayland + NVIDIA + Chromium-based browsers + Plasma.
I will note, this does also happen with vanilla Firefox, but much less frequently.
Would be curious to your thoughts and perhaps some guidance as to how I might be able to trace the root cause when it triggers again.
There is no reason to suspect the issues here are related, I am only working with the info I have from @BroTru.
I would suggest figuring out what process is occupying the CPU at that time, as well as extracting any logs from the exact time period (minus and plus a few seconds) of the event.
I meant this as a preamble to establish that I’m not doing anything outside best practices.
The only definitive answer I can provide is that it has something to do with hardware acceleration for web browsers - and chromium-based are the biggest offenders, though it does occasionally appear with FF. The only consistent error I’m seeing in the journal is what I had posted earlier…
You want the 20-nvidia-as-primary file to be removed and to try solutions without that, do I understand you corectly?
after that there were no visible lags but nvidia was almost not used, simple games had problems on external monitor. And just bringing the 20-nvidia-as-primary file brought the large lags (40% of CPU and 100% of GPU used constantly).
I’ve played with changing between nvidia and nvidia-open together with garuda hardware profile - no results.
Just a reminder - everything worked correctly before the update - I had a working version with X11, no lags, I had to spend a day of playing/configuring to get there but it was fun :). And I could play 3d games on the external monitor with no issues. Update came and it all stopped.
Default garuda linux nvidia support made it impossible to play 3d games on the external monitor plus - which was annoying - I had visible latency when playing chess (lichess.org) on the external monitor, also when I was just typing in KDevelop the letters appeared to late ;). So - the default garuda linux nvidia support did not work but I spent some time and I had a nice system that broke after the update.
I’ve used different linux distros for more than 20 years, I’m a programmer, about 15 years ago I had to even do some small code changes in a video driver to make linux work correctly on my machine
Anyway - for more than 5 years at least I had no issues with linux desktop config (I used other distros), so I’m a little rusty now, and I never worked with arch based desktops - so there’s a lot of things I still don’t know about the distro and about the current linux tools/configs etc but what I know is that default nvidia support in garuda is not good for sure on my hardware (acer nitro laptop) and the workarounds with X11 usage that worked before stopped working now and I have to come out with some new ones - if that’s even possible with current code state.
That’s not ideal. Does this happen in Wayland as well? May we have a new garuda-diag from Wayland using the standard Garuda Linux NVIDIA Setup?
The NVIDIA GPU has to copy from the amd GPU framebuffer to display on the external monitor in PRIME, which might be the cause for this. It’s hard to say.
It’s near impossible to know why this broke for you. I can guarantee you no change in regards to this happened on the Garuda Linux end. This is probably related to an upstream package change on Arch Linux, but even that package change is probably not their fault either.
Yes, it happened on Wayland, that was the reason I switched do X11 after reading about problems with Wayland and nvidia (and after trying the suggested prime-run on wayland; even for firefox just to play chess I had no success on the external monitor, and trying to add prime-run on every text editor/ide that I’d run - to avoid a slight but annoying latency - was something I didn’t even want to try as it would be ridiculous ), than I was able to solve the issue on X11. I’ll put diag outout from wayland with no changes after I rollback everything I’ve done, I think I’ll try the reset script you suggested but first I had to do some backups.
Thank you. Yes, I saw that post and went through my own system to see if things were similar. Unfortunately, I’m not sure I was able to draw a connection between our experiences beyond the fact that it’s having a similar symptom.
So, I noticed that when the chromium browser becomes unresponsive, I can still ove the mouse around and still interact with other windows, HOWEVER, as soon as I resize any other window that’s not unresponsive, the entire plasma desktop just becomes completely unresponsive, no mouse or keyboard activity. I tried to SSH into the desktop over the LAN, and that just sits and waits to show me a shell prompt until the system becomes responsive again. The journal does not appear to indicate anything that would point to this when it happens. I have no idea how to track this down…
They may not be exactly the same, but what would be good to know is if my methods help improve the behavior on your system when it begins acting up. From what you’ve written my symptoms sound exactly the same as yours. I also sometimes experience the mini freezes for 5 - 10 seconds where the mouse will move but typing is buffered until the freeze up is over. Once the mini freeze is over the keystrokes that were typed will appear.
I have created desktop shortcuts and bash aliases to restart plasma, kwin, and flush my ram.
At the terminal I simply type nuke && flush to execute all those actions. That seems to fix up any strange KDE behaviors that are occurring. TNE has warned me before that flushing my RAM is dangerous. I only clear my RAM if it is about half used (never full) and I’ve likely done this hundreds of times in the past with no adverse affects that I’ve noticed. I’ve even written and used services that flushed my ram and cleared all caches prior to suspend or shutdown and have never seen negative effects. I believe I also wrote a service to clear my RAM at a certain threshold, but I didn’t use that service for too long. Not because It caused issues, but simply because it felt like overkill.
All I’m saying is that my methods my be worth a try when KDE gets buggy.
I’m on my cell, but when I get back to my computer I can post the commands I use if you’d like to test this method out. Let me know.
Apologies, if I’m cluttering up this thread with unwanted info.
I did the garuda-upda reset. Didn’t loose any data, just have to reinstall everthing, so it’s not bad
I didn’t tweak any nvidia related settings, did not install X11.
here is the garuda-diag output:
Firrst fing I’ve noticed - latency in Firefox on an external screen when just moving a chess piece. Or - small (but also annoying) latency when typing letterrs on the external screen. Not acceptable on a PC Notebook ├ CPU AMD Ryzen 5 6600H (12) @ 4.57 GHz ├ GPU NVIDIA GeForce RTX 2050 [Discrete] ├ GPU AMD Radeon 660M [Integrated]
with 32 MB of ram
And - system monitor /nvtop shows waves on increased GPU usage (up to 120%) - the nvtop shows 100% of the integrated (amd) gpu usage and up to 20% of nvidia usage. The only apps open - terminal, Kate and firefox to write this. I had smaller GPU usage waves on Wayland before the system reset (but larger CPU increases)
My next step will be to install X11 so that I can have some usable experience on the external monitor. I knew that NVidia sucked by not releasing technical details of their cards to the linux community making the Wayland development difficult but just having a browser or a text editor lag on wayland while working OK on X11 is really bad.
The only keyboard shortcut I have is for plasmashell --replace - but I’m not sure it does anything once my system is responsive again. If it worked when the system was unresponsive, that’s one thing, but…the system is unresponsive to the command, so when it comes back to life, I’m not sure what utility resetting plasma has…?
Would be curious to know what’s in your nuke and flush scripts, if you’d be open to sharing?
Ok, some new interesting tidbit… I happened to be logged in via SSH to desktop, with htop running when the lag occurred. Interestingly, the lag/freeze was isolated to the Plasma desktop session, and htop was showing the btrfs-cleaner was running at 99.1% CPU usage. As soon as the lag/freeze cleared, that CPU usage on that process dropped below 1%.
Anyone have any ideas why this would be?
Edit: some initial research seems to indicate that having quotas enabled is a factor in high CPU usage for btrfs-cleaner. Since I don’t enforce any quotas, I disabled it. Let’s see if that makes a difference.