Below is a first draft of a guide I'm compiling to help in these type of situations.
Hope it helps.
Troubleshooting System Stutter, Freezes, Lags, and Hangs:
Important forum threads regarding this topic to read:
Disk IO reaches 100%, causing system hangs
Temporary system hangs/freezes when updating
Load over the CPU is too High | SLOW Response | Freezing | Hang | Crash
If you use Docker:
[btrfs] Docker and Subvolumes
Sluggishnes of system and weird journal errors
The causes of system lags/freezes/hangs has a very extensive list of possibilities. The causes can range from faulty hardware to hundreds of software/firmware//kernel/bios possibilities. It is often best to eliminate bios/kernel possibilities first, as those are two of the most promising, (and least time consuming) leads to pursue. Be sure to test out at least 3 or 4 of the most recommended kernels. Also ensure your bios (and your system) is fully up to date.
Search the forum first, (and then the internet) for possible causes/fixes to your issue.
Start by running a forum search on variations of the following terms:
hard lock up
hard power down
hard power off
force power down
Any variations of those terms should return lots of hits to comb through.
Here are some further tips to aid with your search efforts:
How to search for solutions the right way
Next, start writing a list of all suggestions for fixes you uncover and record all your efforts. Record all inputs and outputs of any commands run. Report in full detail every fix you attempt and post any relevant logs. Regularly post the results of your progress. There have been many suggestions already given on the forum (and internet), to correct freezing issues. It is your job to sift through all the possible fixes already posted to increase your probability of finding a solution.
If you are thorough and precise about reporting all your troubleshooting efforts, there is a good chance a solution will be found to your issue. The more proactive you are in this respect, the more likely you are to receive assistance from forum experts to find a solution. The less information and documentation you provide,the less likely you will ever find a resolution to your issue. Optics are important if you desire assistance, the more of an effort you make, the more likely others will make extra efforts to help you.
The first step you need to take to diagnose your issue is to start monitoring your resource usage. This may help determine what might be causing your issues. Install and learn how to use monitoring utilities such as
iotop or other diagnostic utilities to help pin down a cause.
The following diagnostic commands may help identify a cause:
sudo ps_mem -S -w 5
sudo dmesg | grep oom-killer
top -o '%MEM'
while true ; do top -b | tee -a ~/top.log; sleep 5; done
The first command requires
ps_mem to be installed.
The second command requires
lm-sensors to be installed.
This last command will output to a log file at
Post the outputs that aren't excessively long on the forum if you require assistance with your problem. Very long outputs may better be posted through a pastebin/hastebin type service. The forum also has its own PrivateBin service for this.
Further troubleshooting suggestions:
Hardware related troubleshooting steps:
smartmontools to run a check and report on all your storage drives health. However, it is possible that software testing might not identify problematic hardware with 100% accuracy. I have encountered several hard drives that caused lockups in the past (even though they passed
smartmon testing). Another option you can try to verify drive health is the
whdd hard drive diagnostic utility. To fully eliminate the possibility of errant test results, it would be best to disconnect any attached drives.
There is another method to eliminate the possibility of software testing not identifying problematic hardware correctly. Rather than checking each individual component via software, you can remove hardware from the equation through a process of elimination. Power off, and then disconnect the computers power plug. Remove or disconnect as much internal hardware as possible. Disconnect any HDD or SSD. Remove PCIe addin cards from their slots. This includes any addin GPU, only if you have onboard graphics (switch to onboard in bios). Disable in the bios any devices that cannot be removed such as onboard network adapters. Leave only one RAM stick inserted. Boot from a live disk to check for any lockups. Eliminate hardware possibilities by gradually adding back devices. If the issue returns after adding back any piece of hardware that was removed you have found your culprit.
Do you think your power supply could possibly be faulty or underpowered? The only sure way to know is to swap out the power supply, (if you have a spare). Cleaning the vents and fan on the PSU with compressed air is a good practice to get in the habit of doing.
Try using an alternate GPU, if you have onboard, or a different add in graphics card, (if available). If using the open source Nvidia driver, switch to the proprietary driver, and vice versa.
Check that your system temperatures are not high as this can result in an unstable system. The CPU temperature can be a major factor in contributing to freezes. Have you cleaned the exhaust ports, power supply vents, fans, and heat sinks inside your computer recently? Install the
lm-sensors package, and then issue the
sudo sensors-detect command. If the CPU is running in excess of 75 deg C you should be concerned. If your CPU temps are still higher than recommended after cleaning your computer thoroughly you may need to re-seat your CPU heatsink with new thermal paste.
Remove/disconnect all peripherals.
Check that all internal cables are seated properly and undamaged.
Check all USB cables for damage as pets often like to chew cables and this can cause lockups.
Disconnect all but your main monitor.
Disconnect all peripherals like USB hubs, USB Hard drives, printers, web cams, etc etc as a test to see if the freezes still occur.
Reboot into your bios and (if possible) disable your Ethernet and WiFi in your bios temporarily as a test..
If you are using a wireless mouse or KB, replace them with a wired version for troubleshooting purposes.
Remove all ram sticks, and the reinsert them, making sure all are seated properly.
Are your ram sticks all the same matching type recommended by your mobo manufacturer?
Run the long memory test with
memtest86. Sometimes memory sticks can pass memtest86, but can still crash Linux. To discount this possibility, remove all ram sticks except one. Cycle through testing each individual ram stick, allowing sufficient time to see if the freezes continue or end.
Software, firmware, kernel, scheduler, bios, troubleshooting suggestions:
Check your logs for any instances of kernel panic, as this is a definite indicator that something is amiss with your kernel.
Changing kernels is one of the easiest troubleshooting steps you can perform and it resolves far more issues than you'd ever expect. Whenever you start to experience unusual issues with your system, the first thing you should do is test at least three alternate kernels. For those experiencing severe system freezes/crashes testing out alternate kernels should always be your first step.
You can install various kernels via the terminal with the following commands:
sudo pacman -Syu linux-lts linux-lts-headers
sudo pacman -Syu linux linux-headers
sudo pacman -Syu linux-mainline linux-mainline-headers
sudo pacman -Syu linux-cacule linux-cacule-headers
sudo pacman -Syu linux-xanmod linux-xanmod-headers
sudo pacman -Syu linux-hardened linux-hardened-headers
I would suggest starting at the top of the kernel list and working your way down, (if your issue hasn't improved).
You can switch to a newly installed kernel after a reboot via the grub boot menu at startup. Simply choose the kernel you wish to boot into from the kernel choices listed in the menu. Also, be sure to test the "fallback" version of each installed kernel from the grub boot menu as sometimes this can correct severe issues. After installing a new kernel it is best practice not to immediately uninstall your old kernel. It is always best to have at least two kernels installed in case one kernel experiences an issue booting. The LTS kernel is the recommended choice to keep installed as a backup kernel.
Test a Different Scheduler:
Sometimes a kernel change alone will not resolve some stubborn freezing problems. For those that have tested multiple different kernels and are still experiencing freezes, it is a good idea to also test out different I/O schedulers. Some kernel versions, (such as cacule) come preconfigured with different I/O schedulers, otherwise you must manually change schedulers yourself.
Try monitoring your disk I/O activity with the
iotop utility. Excessive I/O activity can lead to slowdowns or freezes. If this appears to be happening you might want to try changing your I/O scheduler. This is an especially worthwhile troubleshooting step if you find any I/O errors in your logs. Test out different schedulers to see if there is any performance improvement.
To identify the scheduler in use for all drives, run:
grep . /sys/class/block/*/queue/scheduler
Switching the default scheduler in use may seem a little confusing if you've never attempted it before, however it is actually quite simple.
See the Archwiki documentation for information on:
Changing the I/O scheduler
Investigate and test various kernel parameters
There are a lot of kernel parameters that may help your issue. Unfortunately, the boot parameters are usually very specific to the type of hardware in question. It is very hard to recommend exactly which kernel parameters to test as there many parameters. It may require a fair bit of searching on your specific hardware to find parameters that may help.
The most likely way to locate what you might need is to search for:
Arch Linux freezes kernel parameter "your motherboard model"
Arch Linux freezes kernel parameter "your laptop model"
Is your Bios up to date?
Most default BIOS settings are intended for Windows. Depending on your hardware, you may need to modify your bios settings for your system to be stable when using Linux.
Rectify Freezes on Ryzen 9 mobo's:
Ryzen 9 - Freezing, crashing
Disable BTRFS Quota (qgroups)
Garuda and other distributions have numerous reports linking system freezes with btrfs quota's being enabled. Disabling btrfs quotas would be a logical step if you experience freezes during balancing operations. if you do experience a freeze during a balancing operation, try waiting as long as possible to let things hopefully resolve on their own.
Read the link below for information on how to disable qgroups:
BTRFS quota is automatically re-enabled if I disable it
To disable BTRFS quota's run:
sudo btrfs quota disable /
qgroups will impact timeshift's ability to gauge the remaining space left for creating snapshots. However, with
qgroups enabled on your computer it might feel sluggish, or even grind completely to a halt. It has also been reported that the more snapshots you have, the worse this issue can become.
Documentation regarding quota support in BTRFS:
BTRFS Quota support
There has been some discussion about making qgroups disabled by default with Garuda. At the time of writing this, I believe BTRFS quotas are still enabled by default in all editions.
Unfortunately it seems, some updates may re-enable qroups even though they were manually disabled. Therefor, you will need to check regularly to be sure they stay disabled.
Troubleshoot Garuda's performance tuning packages:
To test if any of the Garuda's performance enhancements are causing issues on your system you may want to try disabling/masking some of these services one at a time. The performance tuning packages Garuda has installed by default have changed over time. Depending on how old your install is, you could have a few of the older services not in current usage running on your system. If you suspect any of these services are causing issues on your system you can temporarily disable them via masking to test for improvements.
You can find out if any of these services are installed and running on your system with the following command:
systemctl status prelockd auto-cpufreq ananicy-cpp irqbalance preload memavaild
To stop/disable/mask any individual service that is running on your system, execute:
sudo systemctl disable --now prelockd.service && sudo systemctl mask prelockd.service && sudo systemctl daemon-reload
sudo systemctl disable --now auto-cpufreq.service && sudo systemctl mask auto-cpufreq.service && sudo systemctl daemon-reload
sudo systemctl disable --now ananicy-cpp.service && sudo systemctl mask ananicy-cpp.service && sudo systemctl daemon-reload
sudo systemctl disable --now irqbalance.service && sudo systemctl mask irqbalance.service && sudo systemctl daemon-reload
sudo systemctl disable --now preload.service && sudo systemctl mask preload.service && sudo systemctl daemon-reload
sudo systemctl disable --now memavaild.service && sudo systemctl mask memavaild.service && sudo systemctl daemon-reload
The service's state should be automatically refreshed by the included
sudo systemctl daemon-reload command.
After testing the results of your systems performance with a service masked, the service can be easily be made operational again if you wish. To reinitialize any of the service(s) you masked, repeat the above command(s) substituting "unmask" in place of "mask" and "enable" in place of "disable", as in the example below:
sudo systemctl unmask irqbalance.service && sudo systemctl enable --now irqbalance.service && sudo systemctl daemon-reload
In some instances you may need to reboot to fully initialize the service, as reloading may not be sufficient in all cases.
BTRFS Balancing Tips:
Something that sometimes helps correct stutters and freezes is deleting all your snapshots and then performing a btrfs balance. The more snapshots you have on your system the worse it seems to affect some systems. Generally I delete all my snapshots and perform a btrfs balance after I have 5 or so snapshots stored.
If you are experiencing lagging/freezing issues it may be helpful to disable btrfs quotas along with deleting your system snapshots. Oh and of course, be sure to make a fresh timeshift snapshot after you've wiped the old ones.
If you are experiencing a system slow down after doing a large update or deleting files that took up a lot of space, performaning a balance can sometimes help greatly. Be sure to reboot after the balancing is complete.
The handy command below will launch a 60% balance operation on / (root) and will also give continual updates on how far along the process is to completion:
bash -c "sudo btrfs balance start -musage=60 -dusage=60 / & sudo watch -t -n5 btrfs balance status / && fg"'
There are numerous threads on the forum dealing with freezing issues with many different suggestions posted on how to hopefully correct the issue. Please search the forum and report in detail on every fix you attempt and post relevant logs and command outputs. To troubleshoot any issue effectively forum assistants must know all the troubleshooting steps that have been tested to have any chance of finding a solution. Threads already covering this topic on the forum should provide plenty of information on the steps you need to take to troubleshoot this issue.
If a thorough search of the Garuda forum doesn't turn up a solution, then searching other Arch based forums is usually the next step. If you can't turn up what you need on Arch derivative distros fora's then a general internet search is your next move.
Feedback you should provide:
Have you tested multiple alternate kernels?
Is your system fully up to date?
Have you checked your resource usage with htop, iotop, etc?
Is this a fresh install, or did this start recently after an update?
Have you tried disabling the baloo file indexer temporarily?
If you press the caps or NUMLOCK key, does your KB state light change?
Is your caps lock LED blinking, (possible kernel panic)?
Can you move your mouse cursor?
Do you have full keyboard functionality?
Does pressing CTRL+T open a terminal?
Is this a complete freeze up with no keyboard or mouse responsiveness?
Does pressing CTRL+ALT+F3 take you to the TTY ?
Have you tried restarting your system from the terminal or TTY ?
Can you use the Magic SysRq keys to restart/shutdown?
Have you tried restarting (KDE) plasma from the terminal?
Is there a specific program or action that often triggers a freeze?
Does it happen out of the blue, completely random, how frequently?
If a freeze does occur, does it resolve on its own if you wait a long time?
Have you tried changing your compositor settings?
Have you tried disabling your compositor entirely?
Have you tried removing all plasma widgets you've installed?
Have you checked, (and posted) your logs errors/crash dumps?
Have you tried Installing
linux-firmware-git , and reboot?
Have you made any overclocking/undervolting or similar modifications?
Do use full disk encryption?
Do you use a swap partition?
Have you installed multiple Desktop Environments/WM's?
If this started recently, have you tried performing a rollback via a snapshot?
Have you merged any/all pacnew files?
How many applications are generally running when the freezing occurs?
Do the freezes happen even if the system is idle?
Which applications have you installed from the AUR or Chaotic repo?
Have you experienced similar issues with this hardware on other OS's?
Did, (or does) freezing also occur on Windows?
Did, (or does) freezing also occur on other Linux distros?
Have you tried booting live disks, with other Garuda DE's or other distros?
Does freezing also occur in live environments, (which ones)?
Does freezing also occur if you create a new user account?
Does the system get progressively slower before freezing?
Does sound continue playing during a freeze?
Are you using
tlp ? if so uninstall it.
If your computer has a disk activity LED, is it blinking during the freeze?
Please answer as many of the above questions as possible. We need to have the complete picture in order to help resolve your issue. Also, please provide feedback on all suggestions put to you (whether you feel they are relevant or not).