Need help troubleshooting kernel panic after install

Hello fine gentlemen.

I really like Garuda, but I am having some tricky installation issues. I managed to get the live USB environment working (I am using the latest 210329 version of Dragonized Gaming) and had no trouble trying it out on VirtualBox, but after installing it for real on my laptop, I cannot boot into it.


Side Note

For anyone curious about my experience getting the live environment to run on my 2021 ASUS Zephyrus G15 laptop, I hid the details in the following section, as that's not really what this post is about. But suffice it to say I stumbled around in the dark for a while and still have questions myself about why what I did worked:

For the Zephyrus G15 specifically, I had to do some prep work, such as getting my Bitlocker recovery keys written down, disabling Secure Boot in the BIOS, disabling Fast Boot, etc. etc. All things that were easy to figure out from reading the Wiki and browsing the internet in general. After that, I downloaded the latest Dragonized Gaming edition of Garuda and burned it to a flash drive with balenaEtcher. I figured the rest would be smooth sailing, but sadly, I was wrong:

My first problem was that launching the Garuda live environment always landed me on a black screen with everything locked up. I couldn't switch to a terminal at all. I tried this on the latest Dragonized, Dragonized Gaming, and the barebones distros and had the same experience. Just to see what would happen, I also tried the 20.04 LTS of Ubuntu, and it loaded up no problem... so it wasn't just my hardware. I suspected it was the drivers bundled with the Garuda live environment. I had no luck using either the free or the nvidia drivers options from GRUB. Scouring the forums, as well as the Wiki and the Arch forums, I managed to get a bit further by adding nomodeset .

So, initially, I had trouble even getting to a shell. After adding nomodeset to the list of kernel params in GRUB, I saw the Garuda initial loading screen, which froze after about 30 seconds. Hitting Esc, I managed to see the output, and realized that everything ran fine all the way up to where the Plymouth bootup script finished. I assume that's the moment it tries to startx to load the windowing environment and fails.

What finally got things working was adding driver=fbdev and nomodeset to the kernel params. Of course, it didn't look like fbdev was installed, but that didn't matter because I was about to install it:

With nomodeset, I managed to get to a shell (Ctrl AltF2). From there, I connected an ethernet cable (couldn't figure out wifi, sadly) and ran sudo pacman -Syu and installed fbdev: (sudo pacman -S xf86-video-fbdev). I also used pacman to install the latest nvidia drivers, but I suspect fbdev is what really did it. sudo Xorg --configure seemed to fail and hang, for reasons my human mind will never be able to grasp.

Here's where things get confusing. After sudo pacman -S xf86-video-fbdev, sudo startx starting working.... somewhat. startx would run, and immediately return. But it ran! I checked the startup script, and couldn't easily figure out why it exited without launching anything.

Out of nowhere, after scouring the internet for clues, I saw some obscure forum post that mentioned the linux command. I was at the point of giving up. I didn't have a linux command, but I did have a linux32 command. I ran linux32. It put me into another shell. From there, I ran startx and magically, the desktop environment loaded!

From there, I just used the installer as usual. It got stuck at 94% for like, 20 minutes, but I hear that's common. Eventually everything installed properly and I rebooted.

If anyone can explain to me linux32 and what the heck it even does, and why startx instantly returns from the normal shell, but with linux32 it does something else, I would love to know :slight_smile:


My Main Issue

When I boot into GRUB, and select "Garuda", I immediately run into a kernel panic. I changed the loglevel to 5 and here are the details as I see them:

I ran some commands while booted up in Windows and determined that 0000:00:00 is the PCI standard host CPU bridge device. I have no idea what the ACPI errors are all about, but at this point I have all of the acpi settings I can think of, including acpi=off.

I tried irqpoll, which generally gives me fewer complaints about PCI stuff.

I have a feeling that what's happening here is this:

  • GRUB boots up, and cryptomounts my partition (I have an encrypted partition, so it asks me to decrypt it).
  • After decrypting, there's a GRUB entry for Garuda, that tells the zen kernel to boot up and provides the encryption keys necessary for the kernel to decrypt the partition, since the kernel will be taking over and will essentially be resetting all communication to all devices and loading up from scratch.
  • GRUB somehow magically enumerated all of the PCI devices just fine to get to this point, but now, with the zen kernel starting up, it loads everything it needs into a RAM-based temporary filesystem and starts over.
  • Now zen is starting up, and it needs to enumerate the PCI devices again. At this point GRUB has handed everything off to zen, and zen needs to find my NVMe drive, so it either talks to BIOS, or does its own magic, whatever. Either way, it fails to find the NVMe controller on my Sabrent Rocket SSD. It complains in the dmesg: nvme nvme0: missing or invalid SUBNQN field. I am not certain, but I think this is the crux of the issue.
  • Without the NVMe drive being found, zen now looks for /init, but /init is on the NVMe drive, which was never mounted.
  • The kernel panics.

I am not an expert on PCI and honestly feel a bit out of my league here, so I am not sure if my assumptions are correct. My first hunch was that the zen kernel might not be compatible with my hardware, but, as the live environment also uses the same exact version of the zen kernel (5.11.10), and the partition utility sees and loads my NVMe drive just fine there in the live environment, it seems unlikely that it's the kernel itself that is not compatible.

I confirmed that my BIOS drivers are on the latest version (GA503QR.404, released 02/08/2021), Fast Boot and Secure Boot both disabled, etc.

In any case, if anyone has any suggestions or where to point me to, I would greatly appreciate any help.

1 Like

Given this is a new install, I would start by reinstalling without encryption to check whether things work "normally". If they do, then it narrows it down to however you set up encryption.

6 Likes

That surprisingly, shockingly, fixed all of my issues :open_mouth:

I mean, I thought for sure that the PCI and ACPI errors were involved, but... they still show up in the dmesg, but everything just loads and works perfectly now.

Thanks a bunch dude! I really should have tried installing without encryption, but after everything else, I just figured "nah, that can't possibly be the issue".

Well there it is then. If you get a kernel panic about not being able to find /init after a fresh install, try reinstalling w/o encryption.

3 Likes

It won't help if you want encryption, but at least we know where the issue lies.

6 Likes

I do want encryption, but ultimately, there's other ways of achieving all of that. My purpose for using encryption is that if the laptop is ever lost or stolen, I don't have to worry about someone else accessing my personal data. For that, container-based encryption combined with a strict policy of where I keep my personal documents works fine for me.

So as long as I can boot into Garuda at all, the rest I can figure out. Still though, I'm curious what went wrong with the install when encryption is chosen during the install process...

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.