Random black screen crashes while running games

System:
  Kernel: 6.3.3-zen1-1-zen arch: x86_64 bits: 64 compiler: gcc v: 13.1.1
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux-zen
    root=UUID=43e301fb-388c-4d64-8023-55c99f67fbc4 rw rootflags=subvol=@
    quiet quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0
    resume=UUID=ebff294a-fe45-486b-b8ba-60f1a4e7ca22 loglevel=3 ibt=off
  Desktop: i3 v: 4.22 info: i3bar vt: 7 dm: LightDM v: 1.32.0
    Distro: Garuda Linux base: Arch Linux
Machine:
  Type: Desktop System: Gigabyte product: B450M DS3H v: N/A
    serial: <superuser required>
  Mobo: Gigabyte model: B450M DS3H-CF v: x.x serial: <superuser required>
    UEFI-[Legacy]: American Megatrends v: F50 date: 11/27/2019
CPU:
  Info: model: AMD Ryzen 3 3100 bits: 64 type: MT MCP arch: Zen 2 gen: 3
    level: v3 note: check built: 2020-22 process: TSMC n7 (7nm)
    family: 0x17 (23) model-id: 0x71 (113) stepping: 0 microcode: 0x8701013
  Topology: cpus: 1x cores: 4 tpc: 2 threads: 8 smt: enabled cache:
    L1: 256 KiB desc: d-4x32 KiB; i-4x32 KiB L2: 2 MiB desc: 4x512 KiB
    L3: 16 MiB desc: 2x8 MiB
  Speed (MHz): avg: 2563 high: 3600 min/max: 2200/3906 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 2200 2: 2200
    3: 3600 4: 2200 5: 3600 6: 2200 7: 2200 8: 2311 bogomips: 57486
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities: <filter>
Graphics:
  Device-1: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
    vendor: XFX Pine driver: amdgpu v: kernel arch: GCN-4 code: Arctic Islands
    process: GF 14nm built: 2016-20 pcie: gen: 3 speed: 8 GT/s lanes: 16
    ports: active: DVI-D-1 empty: DP-1, DP-2, DP-3, HDMI-A-1 bus-ID: 06:00.0
    chip-ID: 1002:67df class-ID: 0300 temp: 54.0 C
  Display: x11 server: X.Org v: 21.1.8 compositor: Picom v: git-b700a
    driver: X: loaded: amdgpu unloaded: modesetting alternate: fbdev,vesa
    dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1600x900 s-dpi: 96 s-size: 423x238mm (16.65x9.37")
    s-diag: 485mm (19.11")
  Monitor-1: DVI-D-1 mapped: DVI-D-0 model: HP P201m serial: <filter>
    built: 2014 res: 1600x900 hz: 60 dpi: 92 gamma: 1.2
    size: 443x249mm (17.44x9.8") diag: 508mm (20") ratio: 16:9 modes:
    max: 1600x900 min: 720x400
  API: OpenGL v: 4.6 Mesa 23.0.3 renderer: AMD Radeon RX 570 Series
    (polaris10 LLVM 15.0.7 DRM 3.52 6.3.3-zen1-1-zen) direct-render: Yes
Audio:
  Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
    vendor: XFX Pine driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s
    lanes: 16 bus-ID: 06:00.1 chip-ID: 1002:aaf0 class-ID: 0403
  Device-2: AMD Starship/Matisse HD Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 08:00.4 chip-ID: 1022:1487 class-ID: 0403
  Device-3: Medeli Bietrun one driver: hid-generic,snd-usb-audio,usbhid
    type: USB rev: 1.1 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-2:2
    chip-ID: 0a67:11c0 class-ID: 0300 serial: <filter>
  API: ALSA v: k6.3.3-zen1-1-zen status: kernel-api with: aoss
    type: oss-emulator tools: N/A
  Server-1: PipeWire v: 0.3.71 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: Gigabyte driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
    lanes: 1 port: f000 bus-ID: 04:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp4s0 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 689.33 GiB used: 367.21 GiB (53.3%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/sda maj-min: 8:0 vendor: Patriot model: Burst size: 223.57 GiB
    block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s tech: SSD
    serial: <filter> fw-rev: BB.3 scheme: MBR
  ID-2: /dev/sdb maj-min: 8:16 vendor: Western Digital
    model: WD5000AVDS-63U7B1 size: 465.76 GiB block-size: physical: 512 B
    logical: 512 B speed: 3.0 Gb/s tech: N/A serial: <filter> fw-rev: 0A01
    scheme: GPT
Partition:
  ID-1: / raw-size: 197.81 GiB size: 197.81 GiB (100.00%)
    used: 78.88 GiB (39.9%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
  ID-2: /home raw-size: 197.81 GiB size: 197.81 GiB (100.00%)
    used: 78.88 GiB (39.9%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
  ID-3: /var/log raw-size: 197.81 GiB size: 197.81 GiB (100.00%)
    used: 78.88 GiB (39.9%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
  ID-4: /var/tmp raw-size: 197.81 GiB size: 197.81 GiB (100.00%)
    used: 78.88 GiB (39.9%) fs: btrfs dev: /dev/sda1 maj-min: 8:1
Swap:
  Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
  ID-1: swap-1 type: zram size: 23.42 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
  ID-2: swap-2 type: partition size: 25.76 GiB used: 0 KiB (0.0%)
    priority: -2 dev: /dev/sda2 maj-min: 8:2
Sensors:
  System Temperatures: cpu: 47.2 C mobo: 29.0 C gpu: amdgpu temp: 54.0 C
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 1880
Info:
  Processes: 306 Uptime: 4m wakeups: 0 Memory: available: 23.42 GiB
  used: 3.02 GiB (12.9%) Init: systemd v: 253 default: graphical
  tool: systemctl Compilers: gcc: 13.1.1 clang: 15.0.7 Packages: pm: pacman
  pkgs: 1585 libs: 498 tools: pamac,paru Shell: fish v: 3.6.1
  running-in: xfce4-terminal inxi: 3.3.27
Garuda (2.6.16-1):
  System install date:     2022-10-26
  Last full system update: 2023-05-23
  Is partially upgraded:   Yes
  Relevant software:       snapper NetworkManager mkinitcpio
  Windows dual boot:       <superuser required>
  Failed units:            

I'm having a problem that causes my display to go to sleep and all keyboard inputs to cease function. Audio will still play but only for 30 seconds to a minute. It usually happens seemingly randomly while I'm running a game, I've noticed it happening most with the game 'Timberborn'(not a very graphically intense game at first glance) but that is probably only because it is what I am often playing. I'm fully willing to admit this is a hardware problem likely with my power supply or simply overall dust buildup but some second opinions would be nice before I start to tear my machine apart.

I would update your system and BIOS. There was a recent mesa update that fixed its prior version's issues for some AMD GPUs, so I would do a garuda-update.
As for the BIOS, it looks like there are some critical versions released. I would update to at least version F62 as that has fixes for "major vulnerabilities" and the motherboard site states "customers are strongly encouraged to update to this release...". I've seen outdated BIOS cause all kinds of odd issues when running games, so there is a good chance this could be your issue.

3 Likes

Didn't even think of that. I'll try it and update the thread after.

UPDATE: sadly, this didn't fix the issue.

Random could mean that there is a heat problem.
Start the games from the terminal for error messages, open a second one and observe with btop what else could be the problem.
Check also

journalctl -m | grep -E 'Jun 03' | grep -iE 'error|warn|failed'

I am not a log specialist, but it might be a start. You should adjust the date if necessary.

3 Likes

I assumed it was a heat problem but it's steadily getting worse even after turning my fan speeds up. It happens while just browsing the web sometimes now. My power supply is kind of ancient so that could be the issue.

Correction: journalctl has given some info

Jun 03 00:46:32 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=28231, emitted seq=28232
Jun 03 00:46:32 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:32 Kevin kernel: amdgpu 0000:06:00.0: amdgpu: 
Jun 03 00:46:49 Kevin kernel: GpuWatchdog[42283]: segfault at 0 ip 00007f65e4f92336 sp 00007f65d9ffd4f0 error 6 in libcef.so[7f65e0aef000+776f000] likely on CPU 5 (core 1, socket 0)
Jun 03 00:46:52 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 03 00:46:52 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DA4C (len 824, WS 0, PS 0) @ 0xDBCC
Jun 03 00:46:52 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D906 (len 326, WS 0, PS 0) @ 0xD9F6
Jun 03 00:46:52 Kevin kernel: [drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
Jun 03 00:47:12 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 03 00:47:12 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C4AC (len 62, WS 0, PS 0) @ 0xC4C8
Jun 03 00:50:43 Kevin kernel: RAS: Correctable Errors collector initialized.

Maybe you could try changing kernel.
I'd start from linux-lts.
I also noticed that your system seems to be in a partial upgrade status. This is not necessarily a cause of your issue, but should be addressed.
If you are not holding any package back on purpose, a

garuda-update

should fix it. Then

sudo pacman -S linux-lts linux-lts-headers
5 Likes

I have tried searching for your errors online and found

This seems like a hung process and then gpu reset began after that though the the unusual thing is it reports pid 0 at fault. :thinking:

This segfault most likely means a kernel bug since error 6 means the cause of segfault was a user-mode write resulting in no page being found.
according to this tool: Raphael's blog: Segmentation fault error decoder

This last part seems to be caused by runtime pm support which can be fixed by adding radeon.runpm=0 to the kernel parameters in garuda boot options

Overall I mean to say try to switch kernels as filo says which should resolve most if not all the issues that you see in journal log and if just changing kernel doesn’t help and the atombios error is still there in your logs after switching kernels try adding the mentioned kernel parameters

4 Likes

Using lts and adding the kernel parameters didn't work. I've got either the same or nearly the same error

Jun 07 22:50:02 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2630, emitted seq=2632
Jun 07 22:50:02 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun 07 22:50:22 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 07 22:50:22 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing DA4C (len 824, WS 0, PS 0) @ 0xDBCC
Jun 07 22:50:22 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing D906 (len 326, WS 0, PS 0) @ 0xD9F6
Jun 07 22:50:22 Kevin kernel: [drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
Jun 07 22:50:42 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 07 22:50:42 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C4AC (len 62, WS 0, PS 0) @ 0xC4C8
Jun 07 22:51:02 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 07 22:51:02 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B3B2 (len 1227, WS 8, PS 8) @ 0xB63A
Jun 07 22:51:02 Kevin kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vce_v3_0> failed -110
Jun 07 22:51:03 Kevin kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22
Jun 07 22:51:03 Kevin kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jun 07 22:51:03 Kevin kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jun 07 22:52:15 Kevin kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jun 07 22:52:15 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing C4AC (len 62, WS 0, PS 0) @ 0xC4C8
Jun 07 22:52:15 Kevin kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing AD42 (len 126, WS 0, PS 8) @ 0xAD5D
Jun 07 22:52:15 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22
Jun 07 22:52:25 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=2962, emitted seq=2964
Jun 07 22:52:25 Kevin kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0

It seems like your problem is an ongoing issue on drm amd gitlab's repo with the last reply being just now.

I suggest visiting this thread and asking for further support there.

4 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.