Finally got to update my bios to the last non-beta version; after updating once again I rebooted right away and got hanged just before the login screen again. This is probably a different problem, so apologies for invading this post (got hooked on some minor coincidences). I will make a post of my own to tackle this once I get to properly document the problem.
didn’t help, I already had 6.17.6, today’s 6.17.7 didn’t solve it as well. It improved a little - there are moments when CPU goes to 5% but then it’s back to 30% and the cooling fan goes like crazy (GPU above 100%). I have waves of good and bad behavior. That’s not good. garuda-diag:
A little investigation result - it behaves like this only on X11, and only if nvidia is set as the primary GPU - as it was before with no issues - (I have a hybrid system - integrated AMD Radeon with nvidia 2050 discrete).
On wayland there’s no high CPU and GPU usage, seems like before (meaning that external monitor is a little laggy that even when I play chess online I have lags in moving the pieces, also 3d games are not an option ; ) But effects like window wobbling work). On X11 - all effects like window wobbling are not working (if nvidia is primary), just moving a window takes time. When nvidia is not set as primary on X11 - wobbling works, everything looks smoother but nvidia is almost not used so it’s a little laggy (tasks go to AMD).
The main issue I have is that because I tried to update / install some additional tools I lost my snapshot that had no lags/high cpu/gpu usage
on wayland after some time I notice small very small lags - CPU goes sometimes to 20% and GPU to 50%, it’s not as big as on X11 but I think it’s still there
The ONLY supported way to use two GPUs on Garuda Linux is NVIDIA prime or DRI_PRIME.
So the answer for you here might be:
Patient: It hurts when I do this
Doctor: Then… Don’t do that.
From the conversation so far, it’s hard to tell how you might have switched to the NVIDIA card as your “primary” GPU. It’s impossible to know what other damage that tool or method might have caused. I can only advise you to try and revert any changes that might have been made to the best of your ability.
20-nvidia-as-primary.conf
Sound familiar?
From my best guess here, it seems that there are quite the number of OpenGL errors related to basic things like creating framebuffers in your diag output.
I would advise sticking to the default Garuda Linux NVIDIA support. This is Linux, you can do whatever you want, but there’s a reason things are done the way they are usually. Some things are battle hardened.
If you continue to experience problems, it might be necessary to perform a reset of Garuda Linux. The script for this is brand new and experimental and I would recommend having at least 20 gigabytes of free storage space and an install USB ready just in case it goes sideways, but it usually works fine:
This is how I use my NVIDIA card, through prime-run prefixes on the command.
Interestingly, even though I’m following “best practice” for dual GPU architecture, I do get very consistently, the following errors in my journal:
20:49:31.547 UTC user@1000.service Invalid framebuffer status: "GL_FRAMEBUFFER_INCOMPLETE_MISSING_ATTACHMENT"
20:55:23.314 UTC user@1000.service Invalid framebuffer status: "GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT"
Weirdly, these only appear when running chromium based browsers (currently Helium AppImage) and I don’t have more than 12-20 tabs open at a time. What will happen, is that something - I don’t know what - would trigger a full browser and Plasma freeze for at least 20-30 seconds at a time, and then it would just resume from where it left off. If I was mid-typing something and it froze while I was still typing, it will complete whatever was in the buffer when it comes back to life. I have no idea how else to trace or monitor what might be triggering it, but I have narrowed down some variables to: Wayland + NVIDIA + Chromium-based browsers + Plasma.
I will note, this does also happen with vanilla Firefox, but much less frequently.
Would be curious to your thoughts and perhaps some guidance as to how I might be able to trace the root cause when it triggers again.
There is no reason to suspect the issues here are related, I am only working with the info I have from @BroTru.
I would suggest figuring out what process is occupying the CPU at that time, as well as extracting any logs from the exact time period (minus and plus a few seconds) of the event.