Unscheduled reboots/crashes and unsure how to diagnose issues without trends/patterns

Maybe you'll need an intermediate jump. See eg

1 Like

filo you masochist. You know I'm about to actually start from ground zero with this, right?
If I'm reading this correctly, an intermediate jump means I basically have to start with the first update and work my way up each new version. This will be 15+ updates where I read I need to give the motherboard a full minute of cooldown in between updates...

No better way to spend a saturday night I suppose lols :herewego:

No I think two steps. First to 5.50 and then to the needed version.
I didn't read the details anyway...

1 Like

I emerge victorious, but sweet garuda is it so many more steps than just 2.
I had to update incrementally through several versions before I could get 5.50.

Yeah that was pretty painful, but I think it was worth? My opening boot screen before GRUB looks super sharp. I think boot time is much shorter but I just rebooted like 20 times straight, so might also be hallucinating. That's all I can note right now. Praying for a week of uptime, then I will prolly be forced to close thread with bios sol. Otherwise I'll prolly be back with boot logs and begging for more help lool

My attention span and eyeballs are aching :yawning_face:
Many thanks for the legwork from all friends!

4 Likes

Sad news everyone. I didn't make it 15 hours with the updated BIOS.
Chalking up another 'L' for the users. :sob:

I have 200+ lines of journalctl imediately before crash (see pastebin):

(I hope this is how I'm supposed to do it, apologies if wrong/rude)
I will try to continue this tracking method with each crash.

Firedragon and discord were the only things running at the time of freeze/crash/hang.
Am still in search of advice with my newfound post-bios dilemma...Do I give the 6.20 update a go? Seems like wishful thinking to put any faith in it after 0 results found from 6.10. Is my next path forward to try separate linux kernels? This route also seems futile given the fact that I did tons of distrohopping before garuda with similar problems with EneavourOS, Archcraft, axylOS, etc. I think they all had different kernels when I tried them out and encountered the same issue.

Truthfully I don't see enough evidence to motivate either paths previously mentioned. I will most likely forfeit the battle for quite some time and just practice being thorough with my crash logging.
The machines always win :rage:

edit: Perma-pastebin instead of week expiration

That's perfect! I hope I didn't come across harshly about the screenshots, it really is important though. The reasoning came up in another discussion here if you are interested.

This particular journal did not seem to offer up any clues, although perhaps someone else will notice something significant that I overlooked.

My instinct would be to bring it to the most current version available, unless you have a compelling reason not to. I would carefully read through the notes on their website about what systems should not take the update, and if you are not on the list I would take it.

If you really want to crack the case, I think it might be time to start chipping away at TBG's homework here. It seems like a staggering document when you first scroll through it, but if you take it in sessions it is manageable.

Take notes as you go and continue to update the thread with things you try and the results; you never know when a detail might jump out at someone.

This section here: Troubleshooting System Stutter, Lags, Freezes, and Hangs - #11 by tbg is an especially good section to blast through and get some more notes into the thread. A lot of the questions go quick, so it's not as bad as it looks. Copy all those questions into Kate or something, type out your answers, and drop the whole thing back into the thread and we'll search for clues. :male_detective:

3 Likes

No harshness received on my end. It just seems silly to me. I get that you can't search for text in an image, but you also can't search a pastebin that expires in a week :clown_face:
But now I'm just splitting hairs, and I promise to abide by all guidelines going forward

I did pull the trigger on 6.20
I figured there's not much to lose if I can't even keep my machine running
more than 24 hours at a time lol

I'll be taking you up on TBG's homework,
and will try to move slower in an attempt to embarrass myself here less :laughing:

Will be filling out the rest soon, just wanted to post what I had as review for myself mostly.

Have you posted the output of the garuda-inxi command?
Have you provided a full history of fixes attempted?
Both yes, See OP

Have you checked for errors/segfaults/crash dumps 2, and posted your logs 2?
Yes, and will continue to do so

Have you given at least 3 alternate kernels a test out? which ones?
Yes, zen, lts, & hardened. LTS could not successfully boot, 
others crashed without sufficient evidence

Have you fully updated 2 your system?
If garuda-update counts, yes.

Is your BIOS up to date?
Unfortunately yes.

Have you checked your resource usage with htop, iotop, etc?
yes, nominal

Are you getting close to maxing out your ram at any time?
Not at all. No idea what to make of this either

Have you checked your system temperatures 1?
Yes, nominal

Did this issue start recently after an update 2?
Not at all.

If this started recently, have you tried performing a rollback 1 via a snapshot?
Not recent, but still yes.

Can you recall making any config changes about when this issue began?
Reinstalled several times to ensure this wasn't to blame lol so no

These random crashes are always hard to find?
Have you stress tested your machine to see if you can replicate more easily, ie cpu / gpu to check the components and power supply

2 Likes

I ran very demanding games with this machine before and after installing garuda without issue. The only pattern of collapse seems to be idling atm.

I see a lot of lines at the end of the journal related to ananicy-cpp.
Maybe this is normal, anyway you could try changing your performance tweaks (this is part of the tutorial).

1 Like

image

3 Likes

Have made edits accordingly. I was initially worried a permanent pastebin would be seen as wasteful in some regard.

That's one (very good) test, but have you run any tests that your BIOS has?

2 Likes

It doesn't seem to have any tests when I poke around the BIOS menu. Are there any other specific keywords to look for? I found 0 options with the word test in them.

If it isn't normal, I have those lines all the time since a looong time ago and my system is rock solid (since I fixed a hardware CPU issue).
So my guess is they are not harmful.

2 Likes

First gen Ryzen CPUs have issues with c-states (in low activity, it can freeze). Usually running LTS kernel fixes this I find with my Ryzen 1700 machine, but since you can't load that up maybe take a look here: Ryzen - ArchWiki

That being said, it might be better to see why you can't load up the LTS kernel though as that makes it seem like there is more than one problem going on here. (Anyways, running the LTS kernel might be the a more simple solution to run once the other bugs are fixed.)

6 Likes

I believe there are kernel parameters that can be used for Ryzen issues with c states. Sorry, but I'm not that familiar with which ones are best to use. Some searching should turn up the most likely candidates to help with those types of issues.

3 Likes

Hello all, I have nothing but more sorrow to report. I was really holding my breath that the hardened kernel wouldn't crash on me, but I didn't quite make it 48 hours until a perma-freeze was found yet again. I have a slightly bigger log because firedragon seems to really hog the journalctl logs, see here:

I am unable to decipher any signs or symptoms of other things to investigate. One oddity with this crash is that I still had mouse movement/control, but clicking had no effect. 18:54 was frozen in my topbar, but it doesn't look like journalctl got that far in time before crashing which is fun to meditate on. I also found it worth mentioning I only had firedragon and discord running(idling?) at time of freeze.

My next order of business is geared toward getting the LTS kernel to boot. It does seem odd that it's the only one I can't get running when I'm pretty sure the idea is that the LTS be the most reliable/dependable lol So maybe I'm in much worse waters than I previously thought.

I also have a lot of research to do with Ryzen's c states(They didn't mention C states at all in my CompSci degree at uni lol not surprised tho). This will be option #2 when I'm done pulling my hair out for the LTS kernel.

As always, eternally grateful for the ideas/things to investigate. The machines are still winning, but as long as I have new things to try, I don't feel defeated or alone :pray:

EDIT: Had a good laugh as it crashed shortly after this initial post. I couldn't even get coffee without a freeze. I'm dropping one last (I promise no more pointless pastebins of journalctl, I'm fairly certain that nothing is out of the ordinary here--I just had to prove there really seems to be no patterns or irregularities on my end) crash log of journalctl:

1 Like

lts kernel gets hung almost immediately, I don't even get past the

Loading Linux linux-lts ...
Loading initial ramdisk ...
_

I can alt + F2 into tty2 terminal login, but when I logged in as root and ran startx it basically said I had no xinit. Unsure what else I can do here