SDDM sometimes fails to start and requires a hard reset

Firstly, sorry, it's another one of these threads... :sweat_smile:
This post has been long in the making, and I’m about ready to throw in the towel by disabling SDDM. I’ve been fighting this ever since I first installed Garuda, and so far, nothing I’ve tried has worked – I also ran into some other issues along the way, but those are for another day (and thread).

Essentially, this is yet another ‘SDDM does not start/is stuck on black screen after GRUB/freezes on a starting process’ issue. I have the Plymouth boot screen theme set to ‘details’ with all quiet params removed since I am basically always pressing esc to check the textual output, and what I’ve discovered is that sometimes the freeze will happen while trying to load the Samba service, in which case I can see all the textual output – other times it’ll tell me that SDDM has started, and then freeze on a black screen.
What both instances have in common is that the laptop will completely freeze – can’t change to another tty, can’t do any sysreq combos such as alt+prtsc+b to reboot, and can’t login by typing my password as though the login screen is just invisible. A hard reset is always required.

All the stuff I’ve tried to resolve this issue that has failed to fix it:

  • Everything suggested in this thread: Computer doesn’t boot, boots to a black screen, or stops at a message

  • Using different kernels (tried zen, lts and hardened, all to no avail – currently using lts due to some other issues related to suspend)

  • systemctl set-default graphical.target so that the system isn’t using multi-user.target instead

  • Adding a 30 second delay (which has somewhat lessened the occurrence of this issue) to the start of SDDM by executing the following:


sudo systemctl edit sddm-plymouth

# inside the override config file

[Service]

ExecStartPre=/bin/sleep 30

  • Installing xf86-video-intel, as I noticed in my Xorg.0.log before that it could not find any Intel modules. I’m actually not sure whether this could’ve made the issue worse, as the display manager now has two (possibly competing) graphical modules?

  • Adding the proprietary NVIDIA modules to mkinitpcio.conf so it contains MODULES=(crc32c-intel intel_agp i915 nouveau nvidia nvidia_modeset nvidia_uvm nvidia_drm), and then refreshing the initramfs with sudo mkinitpcio -P

  • Modifying usr/share/sddm/scripts/Xsetup to use NVIDIA with


xrandr –setprovideroutputsource modesetting NVIDIA-G0

xrandr –auto

  • Changing SDDM to use Wayland instead of X11 – didn’t solve the problem and only brought on its own set of problems, such as a black screen with cursor after suspend (at least I could switch to a virtual console…)

There may have been some changes which I forgot about since I wasn’t thoroughly documenting my steps >_>;; but these were the main ones.

Here are some links to logs which might be relevant (some logs are a few days old as that’s when I happened to be adamantly troubleshooting the issue – if you would like newer logs, please let me know):

I’m at a loss for what to try next – any help would be appreciated! ^^;
If there's any further elaboration and/or logs required, lemme know and I'll get them to you when I can.

In the end if nothing works, I’m happy to follow the advice of a Redditor who essentially said, “From my experience with NVIDIA, you are better off forgoing graphical login managers. Log into a console and use startx instead”. :’)

Lastly, garuda-inxi:

System:
  Kernel: 5.15.79-1-lts arch: x86_64 bits: 64 compiler: gcc v: 12.2.0
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-lts root=UUID=c9f99daa-f49a-4d9f-a961-e0361c675eed
    rw rootflags=subvol=@ splash rd.udev.log_priority=3 vt.global_cursor_default=0
    resume=UUID=8d72c892-6d9c-4514-b2d6-398dc499ca0c loglevel=3 ibt=off
  Desktop: KDE Plasma v: 5.26.3 tk: Qt v: 5.15.7 info: latte-dock wm: kwin_x11 vt: 2 dm: SDDM
    Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Laptop System: Dell product: XPS 15 9560 v: N/A serial: <superuser required> Chassis:
    type: 10 serial: <superuser required>
  Mobo: Dell model: 05FFDN v: A00 serial: <superuser required> UEFI: Dell v: 1.24.0
    date: 08/10/2021
Battery:
  ID-1: BAT0 charge: 26.7 Wh (32.3%) condition: 82.7/97.0 Wh (85.3%) volts: 11.2 min: 11.4
    model: LGC-LGC8.33 DELL 5XJ28 type: Li-ion serial: <filter> status: discharging
CPU:
  Info: model: Intel Core i7-7700HQ bits: 64 type: MT MCP arch: Kaby Lake gen: core 7 level: v3
    note: check built: 2018 process: Intel 14nm family: 6 model-id: 0x9E (158) stepping: 9
    microcode: 0xF0
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache: L1: 256 KiB
    desc: d-4x32 KiB; i-4x32 KiB L2: 1024 KiB desc: 4x256 KiB L3: 6 MiB desc: 1x6 MiB
  Speed (MHz): avg: 900 min/max: 800/3800 scaling: driver: intel_pstate governor: powersave
    cores: 1: 900 2: 900 3: 900 4: 900 5: 900 6: 900 7: 900 8: 900 bogomips: 44798
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
  Type: retbleed mitigation: IBRS
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: IBRS, IBPB: conditional, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel arch: Gen-9.5
    process: Intel 14nm built: 2016-20 ports: active: eDP-1 empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2
    bus-ID: 00:02.0 chip-ID: 8086:591b class-ID: 0300
  Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] vendor: Dell driver: nvidia v: 520.56.06
    alternate: nouveau,nvidia_drm non-free: 520.xx+ status: current (as of 2022-10) arch: Pascal
    code: GP10x process: TSMC 16nm built: 2016-21 pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 01:00.0 chip-ID: 10de:1c8d class-ID: 0302
  Device-3: Sunplus Innovation Integrated_Webcam_HD type: USB driver: uvcvideo bus-ID: 1-12:7
    chip-ID: 1bcf:2b95 class-ID: 0e02
  Display: x11 server: X.Org v: 21.1.4 with: Xwayland v: 22.1.5 compositor: kwin_x11 driver: X:
    loaded: intel dri: i965 gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 3840x2160 s-dpi: 168 s-size: 580x326mm (22.83x12.83") s-diag: 665mm (26.19")
  Monitor-1: eDP-1 mapped: eDP1 model: Sharp 0x1476 built: 2016 res: 3840x2160 hz: 60 dpi: 279
    gamma: 1.2 size: 350x190mm (13.78x7.48") diag: 397mm (15.6") ratio: 16:9 modes: 3840x2160
  API: OpenGL v: 4.6 Mesa 22.2.3 renderer: Mesa Intel HD Graphics 630 (KBL GT2)
    direct render: Yes
Audio:
  Device-1: Intel CM238 HD Audio vendor: Dell driver: snd_hda_intel v: kernel bus-ID: 00:1f.3
    chip-ID: 8086:a171 class-ID: 0403
  Sound API: ALSA v: k5.15.79-1-lts running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.60 running: yes
Network:
  Device-1: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter vendor: Rivet Networks
    Killer Wireless-n/a/ac 1535 driver: ath10k_pci v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1
    bus-ID: 02:00.0 chip-ID: 168c:003e class-ID: 0280 temp: 41.0 C
  IF: wlp2s0 state: up mac: <filter>
Bluetooth:
 
 Device-1: Qualcomm Atheros QCA61x4 Bluetooth 4.0 type: USB driver: btusb v: 0.8 bus-ID: 1-4:4
    chip-ID: 0cf3:e300 class-ID: e001
  Report: bt-adapter note: tool can't run ID: hci0 rfk-id: 1 state: down bt-service: disabled
    rfk-block: hardware: no software: no address: N/A
Drives:
  Local Storage: total: 476.94 GiB used: 587.04 GiB (123.1%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Toshiba model: KXG50ZNV512G NVMe 512GB
    size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD
    serial: <filter> rev: AADA4106 temp: 31.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 459.61 GiB size: 459.61 GiB (100.00%) used: 148.98 GiB (32.4%) fs: btrfs
    dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) used: 10.2 MiB (3.4%) fs: vfat
    dev: /dev/nvme0n1p1 maj-min: 259:1
  ID-3: /home raw-size: 459.61 GiB size: 459.61 GiB (100.00%) used: 148.98 GiB (32.4%) fs: btrfs
    dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-4: /var/log raw-size: 459.61 GiB size: 459.61 GiB (100.00%) used: 148.98 GiB (32.4%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-5: /var/tmp raw-size: 459.61 GiB size: 459.61 GiB (100.00%) used: 148.98 GiB (32.4%)
    fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: zram size: 15.48 GiB used: 0 KiB (0.0%) priority: 100 dev: /dev/zram0
  ID-2: swap-2 type: partition size: 17.03 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/nvme0n1p3 maj-min: 259:3
Sensors:
  System Temperatures: cpu: 46.0 C pch: 43.5 C mobo: N/A
  Fan Speeds (RPM): cpu: 0 fan-2: 0
Info:
  Processes: 281 Uptime: 1h 30m wakeups: 3 Memory: 15.48 GiB used: 5.18 GiB (33.5%) Init: systemd
  v: 252 default: graphical tool: systemctl Compilers: gcc: 12.2.0 Packages: pm: pacman pkgs: 1780
  libs: 500 tools: octopi,pamac,paru pm: appimage pkgs: 0 Client: shell wrapper v: 5.1.16-release
  inxi: 3.3.23
Garuda (2.6.9-1):
  System install date:     2022-10-19
  Last full system update: 2022-11-26
  Is partially upgraded:   No
  Relevant software:       NetworkManager
  Windows dual boot:       Probably (Run as root to verify)
  Snapshots:               Snapper
  Failed units:            systemd-vconsole-setup.service 

Dang, can't edit the OP using my previously found workaround so I'll have to comment this :frowning:

The occurrence of this issue is random afaik. Sometimes it happens upon reboot, sometimes it'll happen from a fresh boot (the one that's not a reboot, technical terms...). The frequency is also random - sometimes I'll need to hard reset the laptop 3 times before it behaves, other times once is enough. There doesn't seem to be any pattern to this, which makes it all the more annoying to troubleshoot. :confused:

Yeah, when that recently started happening on my machine, a switch to the linux-lts kernel solved the problem.

regards

2 Likes

Since nothing of what you have tried solved the issue, undo all changes, to have an as clean start as possible (to when it started).

Do you use autologin?

grep . /etc/sddm.conf.d/*.conf

Get an sddm log, from a recent failed start. (check timestamps)

$HOME/.local/share/sddm/xorg-session.log

You probably don’t need all those modules. You should definitely remove nouveau.
Also, nvidia modules are not mandatory, unless your HW needs them. The same is for the intel related modules. Maybe leave only i915 and test for better or worse behavior.

If you have no issues, some messages in Xorg are not enough to suggest installing video-intel driver package. Some experts suggest that modesetting is better than having the driver package installed.

Which nvidia configuration do you have installed? (Garuda settings => Hardware)
If you use nvidia prime, you should add a kernel parameter

nvidia_drm.modeset=1

With more feedback, we may have more to suggest :person_shrugging: .

3 Likes

Hi petsam, thank you for the quick response and the info! :smiley:
Wow, I didn't know that modsetting could be better than the video-intel package. Interesting :thinking:
I've uninstalled xf86-intel-video and now, mkinitpcio.conf only has i915 in the modules section. Will see how that goes :slight_smile: the default target is now back to multi-user.target as well. I'm not sure whether I should remove the delay on SDDM's startup, as that has been the only remotely helpful thing so far by reducing the frequency of the issue. However, if making it happen more frequently will help troubleshoot, then I'm all for it.

usr/share/sddm/scripts/Xsetup initially had everything commented out until an update that made a .pacnew file which changed it to use NVIDIA-G0. Should I comment it out again anyways? (commented and uncommented made no apparent difference, so lol)

About autologin, nah I don't use it. In the file below, I haven't manually edited anything - any custom things were done by KDE.

✦  ╰─λ grep . /etc/sddm.conf.d/*.conf
[Autologin]
Relogin=false
Session=plasma.desktop
User=
[General]
HaltCommand=/usr/bin/systemctl poweroff
Numlock=on
RebootCommand=/usr/bin/systemctl reboot
[Theme]
Current=Swish-0.2
CursorTheme=ArcMidnight-cursors
Font=Fira Sans,13,-1,5,50,0,0,0,0,0
[Users]
MaximumUid=60000
MinimumUid=1000
[X11]
ServerArguments=-dpi 168

For NVIDIA config, I have video-nvidia-prime-render-offload and video-linux installed. Same for the Intel HD Graphics 630, just an additional video-modesetting as well. I'll add that kernel param and see what happens.

Weirdly, $HOME/.local/share/sddm/xorg-session.log is empty for me. However, its neighbour sddm.log isn't, so here it is: Garuda's PrivateBin
It was modified on the 23rd of November '22. If I run into another hang, I'll be sure to post updated logs.

Btw, if I'm doing too much at once, please feel free to tell me to slow down :sweat_smile:

Here's some additional logs that might be helpful:
/etc/default/grub: Garuda's PrivateBin
current mkinitcpio.conf: Garuda's PrivateBin

Update
Uninstalling xf86-video-intel causes Xorg not to load at all LOLL, because it complains about not being able to find a display. Here's the log of that: Garuda's PrivateBin

Setting the default target to multi-user.target instead of graphical.target causes SDDM to not start at all, and sends me straight to the console login.

All of the other reversions seem to be okay - just these two presenting a challenge now. I will try append i915.preliminary_hw_support=1 i915.modeset=1after uninstalling xf86-video-intel when I have the time to continue troubleshooting :slight_smile:

Awesome that switching kernels fixed it for you! Unfortunately though, on my end this bug is persistent across all the kernels I've tried, although sometimes the hardened kernel will make it happen more frequently. D:
Other times though, the hardened kernel will be the only one that wants to start SDDM properly. So weird.

Just keepin’ it simple. Thanks for the response. :slight_smile:

4 Likes

Ok, I am finally free so it's time for some updates.
I managed to uninstall the xf86-video-intel driver - what was causing the issue with being unable to find a display was the 20-intel.conf file I'd put into /etc/X11/xorg.conf.d/ in a different attempt to see what was up with Librewolf not having proper hardware acceleration (that was actually solved by following this article on Arch Wiki ). Unfortunately, it seems my specialty lies in causing problems to fix other problems that don't actually fix the other problem LOL, but it's all part of learning. ;D
After removing that xorg conf file, I can boot again.

As for adding i915.preliminary_hw_support=1 i915.modeset=1 to the GRUB entry after uninstalling xf86-video-intel, it didn't work so that's been reverted.
(btw, is it just me or is editing the GRUB entries in the GRUB menu by pressing E really laggy? Could just be another HiDPI problem, unfortunately I'm no stranger to those and there's never a shortage of them either.)

I've set the delay on sddm-plymouth to 5 seconds to see if any freezing will happen in the future, but it seems it's now really good at delaying itself beyond that to wait for other required modules to start. Apparently, so far, SDDM hasn't failed since the 23rd of November, when that last sddm.log was modified. If SDDM doesn't fail within the next 2 weeks, I'll call this a success. :slight_smile:

The recent NVIDIA updates could have an impact on this, so let's see how it goes ^^


Update after 21 days

I almost forgot about this thread, whoops haha :sweat_smile:

The issue has stopped happening since my last post, and it's unclear what exactly caused it to stop happening (may have been removing the unneeded modules from raminitfs which were there by default and only using i915, adding nvidia_drm.modeset=1 to the kernel parameters or the recent Nvidia update), but I'm glad this issue has been eradicated. I've marked petsam's post as the solution since following the instructions resolved the issue.

From the text output on Plymouth boot screen, any hangs now during boot are from loading kernel modules after 5 seconds... but I solved that by switching kernels. Eventually, that also stopped happening ¯\_(ツ)_/¯ so idk.
Just wait, maybe the boot problems will start again at the least convenient time. My laptop is the opposite of my car, which knows to break down when it's the best time and place to break down... :rofl:


tl;dr

If you've encountered this thread since you're looking for answers to the same thing, here's the general rundown of things you should try, including those mentioned in petsam's post:

  • Remove unneeded modules from your mkinitpcio.conf. Currently, my modules list includes intel_agp in addition to i915 to take advantage of Intel hardware, following this guide. My Garuda came with modules that I didn't need by default, such as amdgpu and radeon.
  • Use a different kernel - hopefully this one solves your issue, as it's the most simple of all of them. From here on out, it's really a game of whack-a-mole.
  • If you use nvidia-prime, add nvidia_drm.modeset=1 as a kernel parameter. If you want this to be a permanent thing and not need to edit with every boot, edit /etc/default/grub.conf, then run sudo update-grub.
  • If SDDM is loading before your graphics modules have a chance to load, add a delay to the start of SDDM by editing sddm-plymouth.service with sudo systemctl edit sddm-plymouth, and add (with 5 being 5 seconds, but you might need 30 seconds if it takes that long for your module to load):
[Service]
ExecStartPre=/bin/sleep 5

Keep in mind that if you have Plymouth, you need to edit sddm-plymouth.service, not sddm.service. If you edit sddm only, it won't have any effect because it won't be used. (Trust me, I tried... lmao)

  • systemctl set-default graphical.target, although tbf without this, SDDM will refuse to run anyways.
  • Run a pacdiff and check whether you have any crucial pending changes to make. I got a lot of .pacnew files since my last post, and some of them may have also been instrumental to addressing the issue.

Hope this helps anyone out there - it's an infuriating issue to deal with ^^;

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.