Mt7921e on an ASUS occasionally decides to fail, then many other things fail

To get it out of the way:

$ sudo inxi -Faz
[sudo] password for jul:             
System:
Kernel: 5.18.15-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux
root=UUID=33a99027-ad59-4bb2-ab88-037f8bba1280 rw [email protected]
quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0 loglevel=3
ibt=off
Console: pty pts/0 wm: kwin_x11 DM: SDDM Distro: Garuda Linux
base: Arch Linux
Machine:
Type: Laptop System: ASUSTeK product: ASUS TUF Gaming F15 FX506HEB_FX506HEB
v: 1.0 serial: <filter>
Mobo: ASUSTeK model: FX506HEB v: 1.0 serial: <filter> UEFI: American
Megatrends LLC. v: FX506HEB.305 date: 07/22/2021
Battery:
ID-1: BAT1 charge: 67.7 Wh (90.9%) condition: 74.5/90.2 Wh (82.5%)
volts: 16.2 min: 15.9 model: ASUS A32-K55 type: Li-ion serial: N/A
status: discharging
CPU:
Info: model: 11th Gen Intel Core i7-11800H socket: U3E1 bits: 64
type: MT MCP arch: Tiger Lake gen: core 11 built: 2020 process: Intel 10nm
family: 6 model-id: 0x8D (141) stepping: 1 microcode: 0x3E
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 640 KiB desc: d-8x48 KiB; i-8x32 KiB L2: 10 MiB desc: 8x1.2 MiB
L3: 24 MiB desc: 1x24 MiB
Speed (MHz): avg: 917 high: 1212 min/max: 800/4600 base/boost: 2277/5000
scaling: driver: intel_pstate governor: performance volts: 0.7 V
ext-clock: 100 MHz cores: 1: 809 2: 801 3: 1212 4: 933 5: 800 6: 1102
7: 801 8: 1039 9: 1105 10: 1119 11: 800 12: 957 13: 800 14: 817 15: 776
16: 802 bogomips: 73744
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Enhanced IBRS, IBPB: conditional, RSB
filling
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: Intel TigerLake-H GT1 [UHD Graphics] vendor: ASUSTeK driver: i915
v: kernel arch: Gen-12.1 process: Intel 10nm built: 2020-21 ports:
active: eDP-1 empty: DP-1,HDMI-A-1,HDMI-A-2 bus-ID: 0000:00:02.0
chip-ID: 8086:9a60 class-ID: 0300
Device-2: NVIDIA GA107M [GeForce RTX 3050 Ti Mobile] vendor: ASUSTeK
driver: nvidia v: 515.57 alternate: nouveau,nvidia_drm non-free: 515.xx+
status: current (as of 2022-07) arch: Ampere code: GAxxx process: TSMC n7
(7nm) built: 2020-22 bus-ID: 0000:01:00.0 chip-ID: 10de:25a0
class-ID: 0300
Device-3: Sonix USB2.0 HD UVC WebCam type: USB driver: uvcvideo
bus-ID: 3-7:4 chip-ID: 322e:202c class-ID: 0e02
Display: x11 server: X.Org v: 21.1.4 with: Xwayland v: 22.1.3
compositor: kwin_x11 driver: X: loaded: modesetting,nvidia
unloaded: nouveau alternate: fbdev,intel,nv,vesa gpu: i915 display-ID: :0
screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
s-diag: 582mm (22.93")
Monitor-1: eDP-1 model: Najing CEC Panda 0x004d built: 2019
res: 1920x1080 hz: 144 dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64")
diag: 395mm (15.5") ratio: 16:9 modes: 1920x1080
OpenGL: renderer: Mesa Intel UHD Graphics (TGL GT1) v: 4.6 Mesa 22.1.4
direct render: Yes
Audio:
Device-1: Intel Tiger Lake-H HD Audio vendor: ASUSTeK driver: snd_hda_intel
v: kernel bus-ID: 3-3:3 alternate: snd_sof_pci_intel_tgl chip-ID: 24ae:7005
class-ID: 0300 bus-ID: 0000:00:1f.3 chip-ID: 8086:43c8 serial: <filter>
class-ID: 0403
Device-2: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel
bus-ID: 0000:01:00.1 chip-ID: 10de:2291 class-ID: 0403
Device-3: Shenzhen Rapoo Gaming Headset type: USB
driver: hid-generic,snd-usb-audio,usbhid
Sound Server-1: ALSA v: k5.18.15-arch1-1 running: yes
Sound Server-2: PulseAudio v: 16.1 running: no
Sound Server-3: PipeWire v: 0.3.56 running: yes
Network:
Device-1: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter
vendor: AzureWave driver: mt7921e v: kernel bus-ID: 0000:02:00.0
chip-ID: 14c3:7961 class-ID: 0280
IF: wlp2s0 state: up mac: <filter>
Device-2: Realtek vendor: ASUSTeK driver: r8169 v: kernel port: 3000
bus-ID: 0000:03:00.0 chip-ID: 10ec:8162 class-ID: 0200
IF: enp3s0 state: down mac: <filter>
Bluetooth:
Device-1: IMC Networks Wireless_Device type: USB driver: btusb v: 0.8
bus-ID: 3-14:6 chip-ID: 13d3:3563 class-ID: e001 serial: <filter>
Report: bt-adapter ID: hci0 rfk-id: 2 state: down
bt-service: enabled,running rfk-block: hardware: no software: yes
address: N/A
RAID:
Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
v: 0.6 port: N/A bus-ID: 0000:00:0e.0 chip-ID: 8086:9a0b rev:
class-ID: 0104
Drives:
Local Storage: total: 953.87 GiB used: 281.47 GiB (29.5%)
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: SK Hynix model: HFM001TD3JX013N
size: 953.87 GiB block-size: physical: 512 B logical: 512 B
speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: 41000C20
temp: 47.9 C scheme: GPT
SMART: yes health: PASSED on: 110d 15h cycles: 420 read-units: 8,865,655
[4.53 TB] written-units: 12,293,296 [6.29 TB]
Partition:
ID-1: / raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.47 GiB
(29.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) used: 576 KiB
(0.2%) fs: vfat block-size: 512 B dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.47
GiB (29.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p2
maj-min: 259:2
ID-4: /var/log raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.47
GiB (29.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p2
maj-min: 259:2
ID-5: /var/tmp raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.47
GiB (29.5%) fs: btrfs block-size: 4096 B dev: /dev/nvme0n1p2
maj-min: 259:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 15.35 GiB used: 3.8 MiB (0.0%)
priority: 100 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 48.0 C mobo: N/A
Fan Speeds (RPM): cpu: 0
Info:
Processes: 345 Uptime: 8m wakeups: 675 Memory: 15.35 GiB used: 3.38 GiB
(22.0%) Init: systemd v: 251 default: graphical tool: systemctl
Compilers: gcc: 12.1.0 clang: 14.0.6 Packages: pacman: 2085 lib: 572
Shell: fish (sudo) v: 3.5.1 default: Bash v: 5.1.16 running-in: konsole
inxi: 3.3.20

I should preface by saying that I know this is a recurring issue with this driver, but I am getting desperate now and genuinely fear that this will interfere with schoolwork in the future and cause me to retreat to a non-arch distro, or even windows (though given that it's a long-standing kernel issue, it seems to me that windows would be the only salvation, barring an actual solution).

Been using Garuda since January of this year. Came from pure Arch, so wasn't really daunted. Decided to make it my daily driver. In around early April I noticed my WiFi would stop working, then as I tried to use my phone to search for some answers garuda hung up. Keyboard did nothing, mouse did nothing, disk access light would blink infrequently, so I did a hard restart. It only happened every now and again, and (conveniently) never when I needed to be in a zoom call or do something important, so I ignored it.

I eventually got curious though since it began to become a bit more frequent, and saw the following in journalctl

Apr 27 21:34:21 *** kernel: INFO: task kworker/1:1:126 blocked for more than 122 seconds.
Apr 27 21:34:21 *** kernel:       Tainted: P        W  OE     5.17.1-zen1-1-zen #1
Apr 27 21:34:21 *** kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 21:34:21 *** kernel: task:kworker/1:1     state:D stack:    0 pid:  126 ppid:     2 flags:0x00004000
Apr 27 21:34:21 *** kernel: Workqueue: ipv6_addrconf addrconf_verify_work
Apr 27 21:34:21 *** kernel: Call Trace:
Apr 27 21:34:21 *** kernel:  <TASK>
Apr 27 21:34:21 *** kernel:  __schedule+0xabe/0x12a0
Apr 27 21:34:21 *** kernel:  schedule_preempt_disabled+0x52/0xd0
Apr 27 21:34:21 *** kernel:  __mutex_lock.constprop.0+0x258/0x590
Apr 27 21:34:21 *** kernel:  addrconf_verify_work+0xa/0x20
Apr 27 21:34:21 *** kernel:  process_one_work+0x24d/0x430
Apr 27 21:34:21 *** kernel:  worker_thread+0x51/0x4c0
Apr 27 21:34:21 *** kernel:  ? apply_wqattrs_commit+0x1e0/0x1e0
Apr 27 21:34:21 *** kernel:  kthread+0x13a/0x160
Apr 27 21:34:21 *** kernel:  ? kthread_complete_and_exit+0x20/0x20
Apr 27 21:34:21 *** kernel:  ret_from_fork+0x1f/0x30
Apr 27 21:34:21 *** kernel:  </TASK>

Full log
I decided against posting a question after seeing this since kworker (which at that time I doubted was the root problem) was fairly innocuous and it was probably a coincidence that it got logged around when the issue happened. The pacman stuff at the end of those logs are due to me running an update at the same time, in the hope of solving it.

After the update the problem didn't go away but became much less frequent and so ignorable again, so I ignored it again. It did recur around mid-May, and finally late in that month I checked the logs to find this:

May 21 16:56:56 *** kernel: mt7921e 0000:02:00.0: driver own failed
May 21 16:56:57 *** kernel: mt7921e 0000:02:00.0: driver own failed
May 21 16:56:57 *** kernel: mt7921e 0000:02:00.0: chip reset
May 21 16:56:59 *** kernel: mt7921e 0000:02:00.0: driver own failed
May 21 16:57:00 *** kernel: mt7921e 0000:02:00.0: Timeout for driver own
May 21 16:57:01 *** kernel: mt7921e 0000:02:00.0: driver own failed
May 21 16:57:01 *** kernel: ------------[ cut here ]------------
May 21 16:57:01 *** kernel: WARNING: CPU: 11 PID: 238740 at kernel/kthread.c:660 kthread_park+0x7b/0x90
May 21 16:57:01 *** kernel: Modules linked in: ccm snd_seq_dummy snd_hrtimer snd_seq rfcomm cmac algif_hash algif_skcipher af_alg qrtr bnep snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation vfat soundwire_cadence fat snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_hdmi mt7921e mt7921_common snd_hda_codec_realtek iTCO_wdt mt76_connac_lib snd_hda_codec_generic intel_pmc_bxt iTCO_vendor_support hid_multitouch ledtrig_audio ee1004 mt76 pmt_telemetry intel_tcc_cooling intel_rapl_msr intel_spi_pci pmt_class snd_hda_intel mac80211 uvcvideo snd_usb_audio r8169 intel_spi x86_pkg_temp_thermal videobuf2_vmalloc btusb snd_intel_dspcfg intel_powerclamp asus_nb_wmi snd_usbmidi_lib snd_intel_sdw_acpi btrtl videobuf2_memops realtek processor_thermal_device_pci_legacy videobuf2_v4l2 snd_hda_codec spi_nor asus_wmi
May 21 16:57:01 *** kernel:  coretemp snd_rawmidi btbcm processor_thermal_device libarc4 mdio_devres kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd btintel intel_cstate snd_hda_core btmtk intel_uncore platform_profile snd_hwdep wmi_bmof cfg80211 bluetooth snd_seq_device i2c_i801 videobuf2_common processor_thermal_rfim libphy mtd snd_pcm i2c_smbus processor_thermal_mbox ucsi_acpi videodev intel_lpss_pci snd_timer ecdh_generic processor_thermal_rapl typec_ucsi uinput tpm_crb intel_lpss mc mousedev joydev crc16 nvidia_drm(POE) rfkill idma64 i2c_dev snd intel_rapl_common typec i2c_hid_acpi tpm_tis int3403_thermal nvidia_modeset(POE) soundcore intel_soc_dts_iosf intel_vsec roles i2c_hid nvidia_uvm(POE) int340x_thermal_zone tpm_tis_core mac_hid tpm rng_core intel_hid int3400_thermal asus_wireless acpi_thermal_rel sparse_keymap acpi_pad nvidia(POE) dm_multipath dm_mod ipmi_devintf ipmi_msghandler crypto_user fuse zram bpf_preload ip_tables x_tables
May 21 16:57:01 *** kernel:  usbhid btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq nvme serio_raw atkbd nvme_core libps2 xhci_pci vmd xhci_pci_renesas wmi i8042 serio radeon amdgpu gpu_sched drm_ttm_helper intel_agp crc32c_intel i915 video ttm intel_gtt
May 21 16:57:01 *** kernel: CPU: 11 PID: 238740 Comm: kworker/u32:12 Tainted: P        W  OE     5.17.7-zen1-1-zen #1 5c9a8a59ee80422eba99b42b78b0e6f145543fb9
May 21 16:57:01 *** kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming F15 FX506HEB_FX506HEB/FX506HEB, BIOS FX506HEB.305 07/22/2021
May 21 16:57:01 *** kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
May 21 16:57:01 *** kernel: RIP: 0010:kthread_park+0x7b/0x90
May 21 16:57:01 *** kernel: Code: 89 df e8 08 9f 01 00 48 85 c0 74 27 31 c0 5b 5d c3 cc 0f 0b 48 8b ab d0 06 00 00 a8 04 74 af 0f 0b b8 da ff ff ff 5b 5d c3 cc <0f> 0b b8 f0 ff ff ff eb db 0f 0b eb d5 0f 1f 84 00 00 00 00 00 0f
May 21 16:57:01 *** kernel: RSP: 0018:ffffb70381eefdf0 EFLAGS: 00010202
May 21 16:57:01 *** kernel: RAX: 0000000000000004 RBX: ffff99d70b29cd80 RCX: 0000000000004187
May 21 16:57:01 *** kernel: RDX: 0000000000000001 RSI: 00000000fffffe00 RDI: ffff99d70b29cd80
May 21 16:57:01 *** kernel: RBP: ffff99d70a28a980 R08: ffff99d7065e2480 R09: ffffb70381eefda8
May 21 16:57:01 *** kernel: R10: ffffffff96a5abc0 R11: 0000000000000002 R12: ffff99d7065e08e0
May 21 16:57:01 *** kernel: R13: ffff99d7065e20e0 R14: ffff99d7065e7588 R15: ffff99d7065e2410
May 21 16:57:01 *** kernel: FS:  0000000000000000(0000) GS:ffff99da7b6c0000(0000) knlGS:0000000000000000
May 21 16:57:01 *** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 21 16:57:01 *** kernel: CR2: 0000562204d2e5c8 CR3: 0000000152410004 CR4: 0000000000770ee0
May 21 16:57:01 *** kernel: PKRU: 55555554
May 21 16:57:01 *** kernel: Call Trace:
May 21 16:57:01 *** kernel:  <TASK>
May 21 16:57:01 *** kernel:  mt7921e_mac_reset+0x9e/0x2c0 [mt7921e 45ad91adcb5c4a57a02104ac3269ac3df106cc80]
May 21 16:57:01 *** kernel:  mt7921_mac_reset_work+0x9c/0x14a [mt7921_common 68e0ef2605a025e514f28c7d52b9f86c0f0b8c7d]
May 21 16:57:01 *** kernel:  process_one_work+0x252/0x410
May 21 16:57:01 *** kernel:  worker_thread+0x54/0x4d0
May 21 16:57:01 *** kernel:  ? mod_delayed_work_on+0x120/0x120
May 21 16:57:01 *** kernel:  kthread+0x138/0x160
May 21 16:57:01 *** kernel:  ? kthread_complete_and_exit+0x20/0x20
May 21 16:57:01 *** kernel:  ret_from_fork+0x1f/0x30
May 21 16:57:01 *** kernel:  </TASK>
May 21 16:57:01 *** kernel: ---[ end trace 0000000000000000 ]---

Full log
At this point I was wondering why I didn't know it was this damn driver again. It did bug me a bit when I was using pure arch, but they seemed to keep their promise of fixing it in an update so I forgot about it. Now it haunts me again, and much worse this time around. Soon after this the problem became very frequent indeed, and somewhat consistent as well (which I can't see any reason for in the logs, including ones I have not shown). After boot I had about 2 hours give or take 10 mins with my laptop before the driver would fail, causing a massive domino effect throughout the system. 30 seconds or so after failure, applications will no longer open nor close (though I can still use them if they're open) and the sudo command no longer works. About 10 seconds after that the shell will no longer open (even in a separate session, so basically it just hangs after login, even if I logged into root). At this point only a hard restart can save the system.

Looking into the whole mt7921e issue again I found a few in a similar situation but they were met with the same "it will be fixed eventually" promise. At this point I recalled Garuda uses the zen kernel, and suspected that perhaps it was behind on the update, so I got the standard linux kernel (well, the arch kernel, ofc, but that's its package name so whatever). This fixed the issue. I spent the summer until yesterday content that I had solved it myself. But then, of course, yesterday happened. It is not as consistent anymore, but the same domino effect occurs. As I write this the driver could fail at any moment. I may not be able to reply immediately to your answers because of this. Log:

Aug 01 23:31:22 *** kernel: mt7921e 0000:02:00.0: driver own failed
Aug 01 23:31:24 *** kernel: mt7921e 0000:02:00.0: driver own failed
Aug 01 23:31:24 *** kernel: mt7921e 0000:02:00.0: chip reset
Aug 01 23:31:25 *** kernel: mt7921e 0000:02:00.0: driver own failed
Aug 01 23:31:25 *** latte-dock[6082]: [2022-08-01 23:31:25.701] [6105] (device_info_linux.cc:45): NumberOfDevices
Aug 01 23:31:25 *** latte-dock[6082]: [2022-08-01 23:31:25.815] [6105] (device_info_linux.cc:45): NumberOfDevices
Aug 01 23:31:25 *** latte-dock[6082]: [2022-08-01 23:31:25.815] [6105] (device_info_linux.cc:78): GetDeviceName
Aug 01 23:31:26 *** kernel: mt7921e 0000:02:00.0: Timeout for driver own
Aug 01 23:31:27 *** kernel: mt7921e 0000:02:00.0: driver own failed
Aug 01 23:31:27 *** kernel: ------------[ cut here ]------------
Aug 01 23:31:27 *** kernel: WARNING: CPU: 8 PID: 130 at kernel/kthread.c:659 kthread_park+0x85/0xa0
Aug 01 23:31:27 *** kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq ccm rfcomm cmac algif_hash algif_skcipher af_alg qrtr bnep snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation vfat soundwire_cadence fat snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi intel_tcc_cooling soundwire_bus hid_multitouch x86_pkg_temp_thermal snd_soc_core intel_powerclamp snd_compress iTCO_wdt coretemp spi_nor pmt_telemetry intel_pmc_bxt ac97_bus snd_pcm_dmaengine mtd iTCO_vendor_support intel_rapl_msr pmt_class kvm_intel snd_hda_codec_hdmi ee1004 mt7921e mt7921_common kvm mt76_connac_lib snd_hda_codec_realtek irqbypass uvcvideo asus_nb_wmi mt76 crct10dif_pclmul snd_hda_codec_generic asus_wmi crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ledtrig_audio intel_cstate btusb snd_hda_intel intel_uncore btrtl wmi_bmof videobuf2_vmalloc btbcm snd_usb_audio
Aug 01 23:31:27 *** kernel:  snd_intel_dspcfg platform_profile snd_intel_sdw_acpi r8169 mac80211 videobuf2_memops btintel snd_usbmidi_lib snd_hda_codec videobuf2_v4l2 btmtk processor_thermal_device_pci_legacy snd_hda_core realtek videobuf2_common snd_rawmidi bluetooth processor_thermal_device snd_hwdep mdio_devres spi_intel_pci snd_seq_device i2c_i801 intel_lpss_pci videodev processor_thermal_rfim spi_intel snd_pcm libphy ucsi_acpi i2c_smbus processor_thermal_mbox intel_lpss libarc4 snd_timer typec_ucsi processor_thermal_rapl mc idma64 nvidia_drm(POE) snd typec intel_rapl_common ecdh_generic joydev crc16 mousedev nvidia_modeset(POE) soundcore intel_soc_dts_iosf intel_vsec roles i2c_hid_acpi tpm_crb i2c_hid tpm_tis tpm_tis_core int3403_thermal int340x_thermal_zone mac_hid tpm cfg80211 int3400_thermal asus_wireless rng_core acpi_pad acpi_thermal_rel intel_hid sparse_keymap rfkill uinput i2c_dev nvidia_uvm(POE) nvidia(POE) dm_multipath dm_mod crypto_user fuse zram bpf_preload ip_tables x_tables usbhid
Aug 01 23:31:27 *** kernel:  btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq nvme serio_raw atkbd nvme_core libps2 vivaldi_fmap xhci_pci vmd i8042 wmi xhci_pci_renesas serio radeon amdgpu gpu_sched drm_ttm_helper intel_agp crc32c_intel i915 drm_buddy video ttm drm_dp_helper intel_gtt
Aug 01 23:31:27 *** kernel: CPU: 8 PID: 130 Comm: kworker/u32:2 Tainted: P           OE     5.18.15-arch1-1 #1 9ff3be2e7813d5f2c07119812e1642852fe6c646
Aug 01 23:31:27 *** kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS TUF Gaming F15 FX506HEB_FX506HEB/FX506HEB, BIOS FX506HEB.305 07/22/2021
Aug 01 23:31:27 *** kernel: Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
Aug 01 23:31:27 *** kernel: RIP: 0010:kthread_park+0x85/0xa0
Aug 01 23:31:27 *** kernel: Code: 00 48 85 c0 74 2d 31 c0 5b 5d e9 6a 06 d4 00 0f 0b 48 8b ab d0 06 00 00 a8 04 74 ac 0f 0b b8 da ff ff ff 5b 5d e9 4f 06 d4 00 <0f> 0b b8 f0 ff ff ff eb d5 0f 0b eb cf 66 66 2e 0f 1f 84 00 00 00
Aug 01 23:31:27 *** kernel: RSP: 0018:ffffa86140797e00 EFLAGS: 00010202
Aug 01 23:31:27 *** kernel: RAX: 0000000000000004 RBX: ffff9c4a93424d80 RCX: 0000000000000000
Aug 01 23:31:27 *** kernel: RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff9c4a93424d80
Aug 01 23:31:27 *** kernel: RBP: ffff9c4a703db300 R08: ffff9c4a708324a0 R09: 00000000fffffff0
Aug 01 23:31:27 *** kernel: R10: 0000000000000003 R11: ffffffffa4ccaa08 R12: ffff9c4a708308e0
Aug 01 23:31:27 *** kernel: R13: ffff9c4a708320e0 R14: ffff9c4a70838610 R15: ffff9c4a70832430
Aug 01 23:31:27 *** kernel: FS:  0000000000000000(0000) GS:ffff9c4dbb600000(0000) knlGS:0000000000000000
Aug 01 23:31:27 *** kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 01 23:31:27 *** kernel: CR2: 000025afc8864000 CR3: 00000001de9ac003 CR4: 0000000000770ee0
Aug 01 23:31:27 *** kernel: PKRU: 55555554
Aug 01 23:31:27 *** kernel: Call Trace:
Aug 01 23:31:27 *** kernel:  <TASK>
Aug 01 23:31:27 *** kernel:  mt7921e_mac_reset+0xa2/0x2e0 [mt7921e 5a9113169ca2a583864bad082029c294b6a5e13b]
Aug 01 23:31:27 *** kernel:  mt7921_mac_reset_work+0xa0/0x14e [mt7921_common 8dcfe9dce2c18a75ff1ac08a79e014cc833cd2a3]
Aug 01 23:31:27 *** kernel:  process_one_work+0x1c4/0x380
Aug 01 23:31:27 *** kernel:  worker_thread+0x51/0x390
Aug 01 23:31:27 *** kernel:  ? rescuer_thread+0x3b0/0x3b0
Aug 01 23:31:27 *** kernel:  kthread+0xdb/0x110
Aug 01 23:31:27 *** kernel:  ? kthread_complete_and_exit+0x20/0x20
Aug 01 23:31:27 *** kernel:  ret_from_fork+0x1f/0x30
Aug 01 23:31:27 *** kernel:  </TASK>
Aug 01 23:31:27 *** kernel: ---[ end trace 0000000000000000 ]---

Full log There's less context in this log because frankly it's the same issue and it's an old issue.

I just want some kind of an actual solution. A work around. I'm sure the dev team is hard-at-work on this and many other issues. But while they're doing that, is there really no other hope for this situation? Am I truly cursed to be haunted by this damn driver for as long as it is in this state?

I have tried the zen and lts kernels and they fail faster than the linux kernel. Significantly faster than they ever have as well. I mean they fail in like 30 ish mins now. So here I am stuck between 3 kernels, all of them eventually "halt and catch fire," to use INTERCAL's terminology, and only the linux kernel holds out for any useful amount of time.

I have seen mention of linux-mainline and I haven't tried that yet, but given how switching kernels in the past was a mere temporary solution, I am doubtful it will work. I will try it soon after posting though.

Why do some still not follow the desired

garuda-inxi

???

1 Like

The mainline kernel is your most likely fix. Unfortunately, there is very little Garuda can do about bugs in Mediatek wifi drivers.

Another avenue you could try is testing older versions of the linux-firmware package. If you are unfamiliar with how to downgrade a package, there is lots of information regarding downgrading on the Archwiki. Try out older firmware packages from around the time you were experiencing the least problems with your wifi.

2 Likes

Sorry. Old habits and so forth

System:
Kernel: 5.18.15-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
parameters: BOOT_IMAGE=/@/boot/vmlinuz-linux
root=UUID=33a99027-ad59-4bb2-ab88-037f8bba1280 rw [email protected]
quiet splash rd.udev.log_priority=3 vt.global_cursor_default=0 loglevel=3
ibt=off
Desktop: KDE Plasma v: 5.25.3 tk: Qt v: 5.15.5 info: latte-dock
wm: kwin_x11 vt: 1 dm: SDDM Distro: Garuda Linux base: Arch Linux
Machine:
Type: Laptop System: ASUSTeK product: ASUS TUF Gaming F15 FX506HEB_FX506HEB
v: 1.0 serial: <superuser required>
Mobo: ASUSTeK model: FX506HEB v: 1.0 serial: <superuser required>
UEFI: American Megatrends LLC. v: FX506HEB.305 date: 07/22/2021
Battery:
ID-1: BAT1 charge: 42.1 Wh (56.5%) condition: 74.5/90.2 Wh (82.5%)
volts: 15.0 min: 15.9 model: ASUS A32-K55 type: Li-ion serial: N/A
status: discharging
CPU:
Info: model: 11th Gen Intel Core i7-11800H bits: 64 type: MT MCP
arch: Tiger Lake gen: core 11 built: 2020 process: Intel 10nm family: 6
model-id: 0x8D (141) stepping: 1 microcode: 0x3E
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 640 KiB desc: d-8x48 KiB; i-8x32 KiB L2: 10 MiB desc: 8x1.2 MiB
L3: 24 MiB desc: 1x24 MiB
Speed (MHz): avg: 1106 high: 1486 min/max: 800/4600 scaling:
driver: intel_pstate governor: performance cores: 1: 923 2: 1128 3: 951
4: 788 5: 1405 6: 1414 7: 874 8: 1486 9: 1142 10: 1045 11: 1249 12: 836
13: 1130 14: 1248 15: 809 16: 1274 bogomips: 73744
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities:
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Enhanced IBRS, IBPB: conditional, RSB
filling
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: Intel TigerLake-H GT1 [UHD Graphics] vendor: ASUSTeK driver: i915
v: kernel arch: Gen-12.1 process: Intel 10nm built: 2020-21 ports:
active: eDP-1 empty: DP-1,HDMI-A-1,HDMI-A-2 bus-ID: 0000:00:02.0
chip-ID: 8086:9a60 class-ID: 0300
Device-2: NVIDIA GA107M [GeForce RTX 3050 Ti Mobile] vendor: ASUSTeK
driver: nvidia v: 515.57 alternate: nouveau,nvidia_drm non-free: 515.xx+
status: current (as of 2022-07) arch: Ampere code: GAxxx process: TSMC n7
(7nm) built: 2020-22 bus-ID: 0000:01:00.0 chip-ID: 10de:25a0
class-ID: 0300
Device-3: Sonix USB2.0 HD UVC WebCam type: USB driver: uvcvideo
bus-ID: 3-7:4 chip-ID: 322e:202c class-ID: 0e02
Display: x11 server: X.Org v: 21.1.4 with: Xwayland v: 22.1.3
compositor: kwin_x11 driver: X: loaded: modesetting,nvidia
unloaded: nouveau alternate: fbdev,intel,nv,vesa gpu: i915 display-ID: :0
screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
s-diag: 582mm (22.93")
Monitor-1: eDP-1 model: Najing CEC Panda 0x004d built: 2019
res: 1920x1080 hz: 144 dpi: 142 gamma: 1.2 size: 344x194mm (13.54x7.64")
diag: 395mm (15.5") ratio: 16:9 modes: 1920x1080
OpenGL: renderer: Mesa Intel UHD Graphics (TGL GT1) v: 4.6 Mesa 22.1.4
direct render: Yes
Audio:
Device-1: Intel Tiger Lake-H HD Audio vendor: ASUSTeK driver: snd_hda_intel
v: kernel bus-ID: 3-3:3 alternate: snd_sof_pci_intel_tgl chip-ID: 24ae:7005
bus-ID: 0000:00:1f.3 class-ID: 0300 serial: <filter> chip-ID: 8086:43c8
class-ID: 0403
Device-2: NVIDIA vendor: ASUSTeK driver: snd_hda_intel v: kernel
bus-ID: 0000:01:00.1 chip-ID: 10de:2291 class-ID: 0403
Device-3: Shenzhen Rapoo Gaming Headset type: USB
driver: hid-generic,snd-usb-audio,usbhid
Sound Server-1: ALSA v: k5.18.15-arch1-1 running: yes
Sound Server-2: PulseAudio v: 16.1 running: no
Sound Server-3: PipeWire v: 0.3.56 running: yes
Network:
Device-1: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter
vendor: AzureWave driver: mt7921e v: kernel bus-ID: 0000:02:00.0
chip-ID: 14c3:7961 class-ID: 0280
IF: wlp2s0 state: up mac: <filter>
Device-2: Realtek vendor: ASUSTeK driver: r8169 v: kernel port: 3000
bus-ID: 0000:03:00.0 chip-ID: 10ec:8162 class-ID: 0200
IF: enp3s0 state: down mac: <filter>
Bluetooth:
Device-1: IMC Networks Wireless_Device type: USB driver: btusb v: 0.8
bus-ID: 3-14:6 chip-ID: 13d3:3563 class-ID: e001 serial: <filter>
Report: bt-adapter ID: hci0 rfk-id: 2 state: down
bt-service: enabled,running rfk-block: hardware: no software: yes
address: N/A
RAID:
Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
v: 0.6 port: N/A bus-ID: 0000:00:0e.0 chip-ID: 8086:9a0b rev:
class-ID: 0104
Drives:
Local Storage: total: 953.87 GiB used: 281.49 GiB (29.5%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: SK Hynix model: HFM001TD3JX013N
size: 953.87 GiB block-size: physical: 512 B logical: 512 B
speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: 41000C20
temp: 43.9 C scheme: GPT
Partition:
ID-1: / raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.49 GiB
(29.5%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) used: 576 KiB
(0.2%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
ID-3: /home raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.49
GiB (29.5%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-4: /var/log raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.49
GiB (29.5%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
ID-5: /var/tmp raw-size: 953.57 GiB size: 953.57 GiB (100.00%) used: 281.49
GiB (29.5%) fs: btrfs dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 133 (default 60) cache-pressure: 100 (default)
ID-1: swap-1 type: zram size: 15.35 GiB used: 3.8 MiB (0.0%)
priority: 100 dev: /dev/zram0
Sensors:
System Temperatures: cpu: 49.0 C mobo: N/A
Fan Speeds (RPM): cpu: 0
Info:
Processes: 344 Uptime: 1h 31m wakeups: 17132 Memory: 15.35 GiB used: 3.65
GiB (23.8%) Init: systemd v: 251 default: graphical tool: systemctl
Compilers: gcc: 12.1.0 clang: 14.0.6 Packages: pacman: 2085 lib: 572
Shell: fish v: 3.5.1 default: Bash v: 5.1.16 running-in: konsole
inxi: 3.3.20
Garuda (2.6.5-1):
System install date:     2022-01-17
Last full system update: 2022-07-31 ↻
Is partially upgraded:   No
Relevant software:       NetworkManager
Windows dual boot:       Probably (Run as root to verify)
Snapshots:               Snapper
Failed units:            dev-binderfs.mount shadow.service systemd-guest-user.service systemd-networkd-wait-online.service
2 Likes

First, at all, reboot :slight_smile:

2 Likes

It is also very important to make sure your bios is up to date, as a bios update could include important changes for your wifi.

2 Likes

Are you dual booting? Does disabling WiFi on Windows and rebooting to Linux fix your issue? Do you have fast boot enabled on Windows?

1 Like

I have been using garuda-update to update my system for some time now (every two weeks, and in fact I updated just yesterday). Does this mean it does not meet the requirements of a "full system update"?

I am not. Sorry I am rushing a bit and forgot to sudo.

Using garuda-update does full system update. The round icon on the right means you haven't rebooted since last update and should reboot ASAP.

2 Likes

Normally I would suspect Windows as a likely culprit and advise to disable wifi powersaving in the Windows device manager. However, with this Mediatek driver it is more likely the Linux driver is buggy, as it has been plagued with problems since its introduction.

2 Likes

Looks like there is a firmware update. And ASUS has included 2 files, one of which doesn't require Windows. Nice!
https://www.asus.com/US/SupportOnly/FX506HEB/HelpDesk_BIOS/

3 Likes

I have rebooted now. Actually twice since when it first came back all USB devices stopped functioning. A second reboot fixed that so hopefully that's not indicative of anything.

Anyway, I'll update my BIOS and get back to you on this.

1 Like

Just out of curiosity have you noticed any patterns such as:

Issue manifests itself more often after a warm boot. Does the issue happen less often if you cold boot?

Have you noticed any correlation between the issue arising after the computer has been idle for a while, or when it has resumed from sleeping?

1 Like

Not a solid correlation. It is more likely to fail in less than 10 mins after a warm boot, but only somewhat more likely. Sleeping and waking up can give a bit more time sometimes, but sometimes it fails immediately after. So no, not a solid correlation there.

I am now on linux-mainline. I'll get back to you on how long it took to fail, if it fails.

1 Like

30 mins is a fine milestone, but so far no better than zen. Still hasn't failed though so I'll keep going.

Odd thing though (I'm sure this is a simple "reboot and it'll be fine" thing, but I am testing how long it can go without failing so can't do that). The battery indicator in the GUI doesn't work anymore. It still thinks it's charging (which it isn't) and it's stuck at 76%. cat /sys/class/power_supply/BAT1/capacity agrees with that reading. It isn't going down. The charging light still tells me if I've passed 95% when charging so I know that 76% isn't true. Again I'm sure it's nothing and it's unrelated anyway.

I hope some of the suggestions put forth here help with your issue, but I just thought I'd add my perspective. I realize you are a student and most students have very little extra money to spare. However, my recommendation would be to purchase a cheap USB Wifi dongle for the interim. This Mediatek wifi adapter has been broken on a regular basis since its inception. If wifi is essential for your schoolwork I would suggest purchasing a backup wifi dongle with good in kernel support.

The most well supported wifi adapter's are usually older, (and cheaper), with more reliable drivers. Old N based adapters are slower, but they are usually rock solid if you go with say an Atheros ath9k based adapter. The Atheros ath9k driver has been around for a long time, has good in kernel support, and is relatively fast, (but definitely far slower than very recent models), and usually very reliable.

Unfortunately, newer is not necessarily better when it comes to Linux hardware support, as you've now discovered. New Wifi drivers even from well respected companies like Intel have been prone to many issues and breakages with the newest models. Frankly your Mediatek adapter has proved to be a steaming pile of dung, and my feeling is the only reason Asus included it in their recent offerings was because of supply chain issues brought on by Covid.

Get yourself a backup USB dongle to avoid these issues until your Mediatek driver matures. Sooner or later Mediatek will work the bugs out with its Linux driver, but who knows how long that will take. Just sidestep these problems by getting yourself an alternate adapter for the interim.

Just my 2 bits FWIW.

4 Likes

I've been considering that since the very beginning of the issue, and for the reasons you've brought up, I have only considered them. I really hoped this was solvable without spending any extra money, but since your opinion lines up with the discourse around this driver that I have seen already, I suppose I have no other options. linux-mainline is working fine so far, so hopefully this will delay the problem for the school year (or at least the semester so I can deal with it during the break). I will certainly come back to this if it recurs under linux-mainline, since perhaps raising awareness of the badness of this driver will incentivise further development of it, though perhaps I'm being a bit optimistic there. Thank you for your help

1 Like

Last night linux-mainline was fine. Cold boot today and it lasted only 20 mins. The battery reading was still inaccurate and stuck, and I only found an old 2018 thread in the archwiki about it. I am now testing every kernel I have, and am now on lts. When warm booting mainline after that first failure, I discovered the mt7921e driver wasn't even loaded when I ran sudo rmmod mt7921e. I then rebooted into the lts kernel and I'll see how long this can go.

Warm boot lts now at 1hr and 18 mins. That's something. lts has never gone this long in the last 3 months. Perhaps some update has indeed occurred.