Hi guys,
Just thought I’d step in to try and highlight something that I see going on.
There is clearly a problem with NVIDIA based Garuda systems right now, regardless of kernel, that is not being acknowledged.
I think the problem might be upstream somewhere as I’ve seen a large number of kernel updates over the last couple of weeks, moreso than usual and on an install that has been fine for 2-3 years, I’m suddenly seeing a lot of problems.
I am also seeing quite a few people posting fragments of what is essentially the same problem albeit being detected in different ways.
The issue appears to be, following a reboot after a system update, people are seeing very slow boot times, no boot up at all or are being dumped into a black screen…or all three.
For me, the issues began with kernel 6.6.3…I was getting errors relating to the NVIDIA DRM module.
I ultimately performed a clean install, and the problem seemed to go away, however, not 3 to 4 days later, I’m seeing the issues creep back…I haven’t seen the DRM errors yet (that doesn’t mean they aren’t happening) but I have seen slow boot times and I have been dumped to a black screen after login.
The problem doesn’t seem to occur with other distros. This isn’t to say that the Garuda team is responsible here, far from it, but I think there is something happening upstream that isn’t being taken in account. What it is? I dunno…
All I can say for sure is that this issue isn’t affecting AMD GPUs (in as far as I can tell, I’ve seen no posts from people reporting issues with AMD GPUs and my secondary machine, which has an AMD GPU does not experience the same issues).
Currently, a clean install seems to make this issue become less of an issue, but the issue does eventually return for some reason or another. Usually not as severe.
Based on what I have seen the conditions to re-create this problem appear to be:
- Kernel newer than 6.6.2 (regardless of kernel type)
- NVIDIA GPU
- More than one monitor (mostly).
- Doesn’t happen immediately after a clean reinstall, but does happen eventually.
The common symptoms are:
- Slow boot up time with DRM errors.
- Booting to a black screen and then going no further.
- Logging in and seeing a black screen with an X mouse cursor.
Fixes that seem to temporarily fix the problem:
- Clean install. Seems to get you about a week.
- Reverting to a previous snapshot.
- Using a different distro (in my case, EndeavourOS and Manjaro do not seem have the same issue).
Fixes that don’t work (permanently):
- Freezing the kernel version to 6.6.2 (not sure why, doesn’t help).
- Reinstalling the NVIDIA driver.
- Switching to a different kernel.
- Updating your BIOS.
- Setting the DRM flag to 0 or 1 in your Grub config.
- Clean install (works for a while, but will ultimately start glitching again).
- Unplugging extra monitors.
- Holding packages unrelated to the GPU / Kernel.
Factors I have yet to discern:
- Whether it is AMD or Intel CPU specific.
- Whether it is only affecting new GPUs.
I am currently writing this on an NVIDIA based PC with Garuda fully up to date, but rebooting is a coin toss. Sometimes it works first time, sometimes it doesn’t.
To be crystal clear, I don’t think the Garuda team is responsible for this problem and I don’t want this post to come off that way, I think the issue is likely coming from upstream…but we need to acknowledge that there is a general problem…there are lots of people going in circles over this issue and some proper investigation needs to be done…something is being missed and it’s causing people to waste a lot of time.
I don’t have enough machines here with enough variety of components to be able to build a clearer picture.
I hope we can bring more clarity to this issue and get everyone fixed swiftly!
Cheers and Merry Xmas!