Unscheduled reboots/crashes and unsure how to diagnose issues without trends/patterns

I ran very demanding games with this machine before and after installing garuda without issue. The only pattern of collapse seems to be idling atm.

I see a lot of lines at the end of the journal related to ananicy-cpp.
Maybe this is normal, anyway you could try changing your performance tweaks (this is part of the tutorial).

1 Like

image

3 Likes

Have made edits accordingly. I was initially worried a permanent pastebin would be seen as wasteful in some regard.

Thatā€™s one (very good) test, but have you run any tests that your BIOS has?

2 Likes

It doesn't seem to have any tests when I poke around the BIOS menu. Are there any other specific keywords to look for? I found 0 options with the word test in them.

If it isnā€™t normal, I have those lines all the time since a looong time ago and my system is rock solid (since I fixed a hardware CPU issue).
So my guess is they are not harmful.

2 Likes

First gen Ryzen CPUs have issues with c-states (in low activity, it can freeze). Usually running LTS kernel fixes this I find with my Ryzen 1700 machine, but since you canā€™t load that up maybe take a look here: Ryzen - ArchWiki

That being said, it might be better to see why you canā€™t load up the LTS kernel though as that makes it seem like there is more than one problem going on here. (Anyways, running the LTS kernel might be the a more simple solution to run once the other bugs are fixed.)

6 Likes

I believe there are kernel parameters that can be used for Ryzen issues with c states. Sorry, but I'm not that familiar with which ones are best to use. Some searching should turn up the most likely candidates to help with those types of issues.

3 Likes

Hello all, I have nothing but more sorrow to report. I was really holding my breath that the hardened kernel wouldn't crash on me, but I didn't quite make it 48 hours until a perma-freeze was found yet again. I have a slightly bigger log because firedragon seems to really hog the journalctl logs, see here:

I am unable to decipher any signs or symptoms of other things to investigate. One oddity with this crash is that I still had mouse movement/control, but clicking had no effect. 18:54 was frozen in my topbar, but it doesn't look like journalctl got that far in time before crashing which is fun to meditate on. I also found it worth mentioning I only had firedragon and discord running(idling?) at time of freeze.

My next order of business is geared toward getting the LTS kernel to boot. It does seem odd that it's the only one I can't get running when I'm pretty sure the idea is that the LTS be the most reliable/dependable lol So maybe I'm in much worse waters than I previously thought.

I also have a lot of research to do with Ryzen's c states(They didn't mention C states at all in my CompSci degree at uni lol not surprised tho). This will be option #2 when I'm done pulling my hair out for the LTS kernel.

As always, eternally grateful for the ideas/things to investigate. The machines are still winning, but as long as I have new things to try, I don't feel defeated or alone :pray:

EDIT: Had a good laugh as it crashed shortly after this initial post. I couldn't even get coffee without a freeze. I'm dropping one last (I promise no more pointless pastebins of journalctl, I'm fairly certain that nothing is out of the ordinary here--I just had to prove there really seems to be no patterns or irregularities on my end) crash log of journalctl:

1 Like

lts kernel gets hung almost immediately, I don't even get past the

Loading Linux linux-lts ...
Loading initial ramdisk ...
_

I can alt + F2 into tty2 terminal login, but when I logged in as root and ran startx it basically said I had no xinit. Unsure what else I can do here

Check all "F"
Ctrl + Alt + F1
to
Ctrl + Alt + F7

3 Likes

I'm confused on what to do after I login and wether it's better to login under root or not. It's also unclear to me what I'm doing when I login to these other terminals. Does this mean I'm successfully booting the lts kernel or does this ctrl + alt + F2 keybind completely sidestep the kernel being loaded? I intend to do more research soon, I just felt like everyone knows something I don't here because startx under root was my only guess and I got no xinit exists when I tried lol

EDIT: root is the only login accepted as correct if I Ctrl + Alt + F2 or higher
startx fails on all (tty2-7)
uname -r displays the lts kernel after I login with root, so this seems to indicate that the kernel is getting loaded in some way, but the graphic stacks are failing potentially?
If I understand correctly, my next order of business is to study/research dmesg and look for point of failure. I'm super unfamiliar with this all, so I'll be hitting the manuals for a minute. Please lemme know where my logic might be going awry? I struggle to ever feel like I'm looking in the right places >.>
I'm also very confused why my one and only nonroot user can't be used on tty2-7
Much to learn...

It is better to not run as root. It is very easy to break things and create additional problems running as root.

What do you mean by this? What is the error message you are getting? Is it not accepting your user password? Is your user not found? (Run getent passwd to see all users.) This is not good.

Perhaps as a general statement this isnā€™t wrong, but everyoneā€™s system is a little different; the LTS kernel gets bugs and regressions like every other kernel. There might be something specific with your hardware, or something else in the stack that a kernel bug is responding to. Also donā€™t forget the LTS kernel is several versions behind the other kernels; if you need a kernel newer than 5.15 for something in your stack, you might run into problems.

Have you tried the mainline kernel yet?

sudo pacman -Syu linux-mainline linux-mainline-headers

Donā€™t forget to investigate this clue:

login incorrect is the only output. I can login and logout just fine with the default session manager(I think its like lxddm or something), but only root at these additional tty2-7 terminals

getent passwd displays quite a bit, but Iā€™m assuming the only line that needed to be found was this:
faf:x:1000:1001:Grant:/home/faf:/bin/bash
I did chose autologin/no pass required @ login when I installed. Iā€™m not sure if this is maybe where I goofed.

Moving on to the mainline kernel now!
EDIT: Mainline seems to be on par with zen and hardened with a single exception after login,
I got a DisplayCal Profile loader error:

DELL S2721DGF @ 0, 0, 2560x1440: Display profile couldn't be loaded.

Not sure what to make of this since everything else looks fine. Will see what max uptime I can get here on the mainline ig

Iā€™m not sure if autologin means your user doesnā€™t have a password or not, Iā€™ve never messed around with autologin setting much. Did you try logging in to the TTY with normal username and no password (blank value for password field)?

If that doesnā€™t work, you can always reset your user password from the TTY if you have root access. From root, type passwd [username] and it will prompt for new password.

1 Like

No, I guess I just confused hostname and username somehow. Unfortunately after getting login with my nonroot user, I get all the same errors as I do when I login under root, that is:

 error (1). Closing log file.
xinit: giving up
xinit: unable to connect to X server: Connection refused
xinit: server error

and for current tldrs; Am nextup hitting c-state ryzen research whilst testing mainline kernel uptime but remain open to changing priority

Well now that you have run startx as root, root owns .Xauthority. Try to delete it:

sudo rm  .Xauthority

You might have to reboot afterward. Then test if you have better luck getting the X server up.

Tried this, same errors as my last post.
I also wanted to post the mainline kernel auto-rebooted sometime when I was sleeping for no reasons listed in the journalctl. Can we finally conclude safely that kernel switching and bios updates are out of the question? 4 kernels tested already feels like overkill, but so did 15 long bios updates. This is some painful learning lol

Iā€™m now trying some stuff from the ArchWiki: Ryzen - ArchWiki
Noticing this bug was picked up in 2018 makes me think I have found my key issue, Thank Garuda the end is in sight.

After additional ryzen research, I'm not as confident that I've identified any honest issue. I'm currently just going to test different BIOS settings and see what gets me the longest uptime.

I did want to post 3 images just to reach out to anyone that can enlighten me on what options would be best for testing purposes(I will make a text menu in imitation later if any prove useful):



For my current test, I'm only Disabling Global C-state Control. It seems too easy to conveniently need to toggle only this option that was previously defaulted to 'Auto' but here's to hoping :crossed_fingers:

Edit: I did try LTS kernel again after this BIOS setting change, and I got a slightly newer error. Before it was freezing on initial ramdisk, now my monitor almost immediately goes into power-saving mode and says there's no signal from DP