Troubleshooting System Stutter, Lags, Freezes, and Hangs

tbg · 12 March 2022 06:34

Index of FAQ contents:

Prologue
Primary Diagnostic Steps
Hardware Troubleshooting
BTRFS Specific Tips
Kernel Parameters
I/O Schedulers
Performance Tuning Packages
Troubleshooting Installed Packages
Required Information
X11 vs Wayland ?

Troubleshooting Prologue:

Important forum threads regarding this topic to read:

Disk IO reaches 100%, causing system hangs

Temporary system hangs/freezes when updating

Load over the CPU is too High | SLOW Response | Freezing | Hang | Crash

If you use Docker:

[btrfs] Docker and Subvolumes

Sluggishnes of system and weird journal errors

Tips to help diagnose your issue:

If you own a laptop and you are experiencing issues, the first thing to do is search the Archwiki for your specific make and model of laptop. The Archwiki often has very detailed tips on how to set up many models of laptops to perform best with Linux.

The causes of system lags/freezes/hangs has a vast range of possibilities. The causes can range from faulty hardware to hundreds of software/firmware/driver/kernel/bios possibilities. It is often best to eliminate bios and kernel possibilities first, as those are two of the most effective, (and least time consuming) leads to pursue. Be sure to test out at least 3 or 4 of the most recommended kernels. Ensure your bios (and your system) is fully up to date.

Search the forum first, (and then the internet at large) for possible causes/fixes to your issue. Search your journalctl logs for any error messages that may be related to your issue. Thoroughly research any error messages you’ve found in your logs online to uncover the cause of, and hopefully the solution to your issue. Unfortunately, when a complete system lockup/freeze occurs many times the logs won’t contain any indication as to what caused the event. That is one of the reasons that diagnosing sporadic freezes is so difficult,

Run an extensive search on variations of, froze, frozen, freeze, crash, hang, freezing, system crash, system freeze, system hang, lock up, locked up, hard lock, hard lock up, hard power down, hard power off, force power down, system unresponsive, & computer unresponsive. Any variations of those terms should return lots of hits to comb through.

Begin compiling a list of all suggestions for fixes you uncover and record all your troubleshooting efforts. Record all inputs and outputs of any commands run. Report in full detail every fix you attempt and the outcome. Post any relevant logs or errors uncovered. Regularly post the results of your progress. There have been many suggestions already given on the forum (and internet), to correct freezing issues. It is your job to sift through all the possible fixes already posted online. Attempt to find cases similar to your own that were already solved that may possibly be applied to remedy your issue.

If you are thorough and precise about reporting all your troubleshooting efforts, there is a good chance a solution will be found to your issue. The more proactive you are in this respect, the more likely you are to receive assistance from forum experts to find a solution. The less information and documentation you provide, the less likely you are to receive assistance and find a solution to your issue. Optics are important if you desire assistance, the more of an effort you make, the more likely others will make extra efforts to help you.

The first step you need to take to diagnose your issue is to start monitoring your resource usage. This may help determine what might be causing your issues. Install and learn how to use monitoring utilities such as top, htop, iotop, ps_mem, or other system monitoring utilities to help pin down a cause.

The following diagnostic commands may help identify a cause:

sudo ps_mem -S -w 10 
sensors
journalctl -b -p3 --no-hostname --no-pager
sudo dmesg | grep oom-killer
swapon --show
cat /proc/sys/vm/swappiness
cat /proc/meminfo
top -o '%MEM'
free -h
while true ; do top -b | tee -a ~/top.log; sleep 5; done

The first command requires ps_mem to be installed.
The second command requires lm-sensors to be installed and configured.
This last command will output to a log file at ~/top.log.

Post the outputs that aren’t excessively long on the forum if you require assistance with your problem. Very long outputs may better be posted through a pastebin/hastebin type service. The forum also has its own PrivateBin service for this.

If you intend to open a help request, be sure to post your system specs from running the following command:

garuda-inxi

If you intend to open a help request also include a copy of the questionaire list (near the end of the tutorial) and answer as many questions from the checklist as possible:

Freezing Questionnaire/Checklist:

tbg · 12 March 2022 06:36

Primary Diagnostic Steps.

Testing Alternate Kernels:

Changing kernels is one of the easiest diagnostic steps you can perform and it resolves far more issues than you’d ever expect. Whenever you start to experience unusual issues with your system, the first thing you should do is test at least three alternate kernels. For those experiencing severe system freezes/crashes testing out alternate kernels should always be your first step. Check your logs for any instances of kernel panic, as this is a definite indicator that something is amiss with your kernel. If your hardware is extremely new, crashes/freezes could be happening because your hardware is not fully supported in the kernel yet.

You can install various kernels via the terminal with the following commands:

sudo pacman -Syu linux-lts linux-lts-headers
sudo pacman -Syu linux linux-headers
sudo pacman -Syu linux-mainline linux-mainline-headers
sudo pacman -Syu linux-cacule linux-cacule-headers
sudo pacman -Syu linux-xanmod linux-xanmod-headers
sudo pacman -Syu linux-hardened linux-hardened-headers

I would suggest starting at the top of the kernel list and working your way down if your issue hasn’t improved. However, if you have just purchased brand new hardware you might instead want to start with installing the linux-mainline kernel, as it has better support for newly released hardware.

If you prefer to avoid the terminal, Garuda also includes a GUI utility for easily managing different kernel versions. You can access Garuda’s GUI kernel utility through your Application Menu. You can open the utility by traversing the menu through Settings ----> Garuda Settings Manager ----> Kernel. Once open, the kernel management utility will display an extensive list of kernel choices that are available. You can easily add a kernel you wish to test out by pressing the Install button to the right of the kernel you wish to test. Similarly, you can uninstall any kernel you no longer wish to keep by pressing the Remove button. Reboot after installing a new kernel to take it for a test spin.

You can switch to a newly installed kernel after a reboot via the grub advanced options menu at startup. Simply choose the kernel you wish to boot into from the kernel choices listed in the menu. Also, be sure to test the “fallback” version of each installed kernel from the grub boot menu if you are experiencing boot issues. After installing a new kernel it is best practice not to immediately uninstall your old kernel. It is always best to have at least two kernels installed in case one kernel experiences an issue booting. The LTS kernel is the recommended choice to keep installed as a backup kernel.

BIOS - Is yours up to date?

Next to swapping kernels, the BIOS/ UEFI is one of the most important things to check on during the primary diagnostic stage. An outdated BIOS is one of the most common, and yet most overlooked causes of system freezes. Having an up to date BIOS is of utmost importance, (especially on portable computers). Even if you consider this an unnecessary step, the BIOS must be updated before proceeding with more in depth troubleshooting procedures. Failure to do so, can lead to a massive amount of wasted time and effort. Updating the BIOS is a primary troubleshooting step that must be performed before proceeding with the secondary stages of testing. Skipping this step because it seems like “too much trouble” for you, may result in assistants refusing to help you as this step must be performed before proceeding further with the other sequence of tests.

Pressing the F2, F10, F12, or Delete key during boot up is the most common way to enter your BIOS setup utility. If those keys do not work to enter your BIOS/UEFI then check your manufacturers documentation, as they may use a different key, (or a sequence of keys). Before resorting to updating your bios, you may want to test resetting your current BIOS back to the factory default. Resetting (or updating your BIOS) will return your BIOS settings to the original state they were at when purchased. Most default BIOS settings are intended for Windows. Depending on your hardware, you will likely need to modify your BIOS settings for use with Linux after resetting your bios. When running Linux be sure both secure boot and fast boot are disabled in the BIOS. Also be sure your controller is set to AHCI mode in your BIOS, (not RAID, Optane, or RST). Also, look to change the settings to “Other OS” from the default of “Windows”.

Changing the following BIOS settings may possibly help with freezing issues. Disable any nonessential hardware in the BIOS such as onboard sound, or onboard networking, as a temporary troubleshooting step. If you don’t use virtualization technologies, then disable it in the bios. Also try disabling any power management options for the CPU in the BIOS, (if present). Do not change your BIOS clock settings/timings to achieve overclocking. This often causes freezing issues, so be sure to use the manufacturers recommended clock settings.

Check your computer manufacturers website for your model of laptop or exact motherboard model to see if a BIOS update is available for your system. Be extremely careful to only download and flash your BIOS with an update intended for your exact model of hardware. Some manufacturers such as HP and Lenovo have made great strides in making BIOS updates user friendly with Linux. With other computer makers it is more complicated, so be sure to do your research.

If by chance you already updated your bios just before your current issues began, then there is a possibility that you received a bad BIOS update. This happens very rarely, but there is still always the off chance the new BIOS release was faulty. If you suspect this as a possibility, be sure to check if there was another recent BIOS released to correct this problem. If not, then you will need to decide if you want to re-flash your BIOS back to the older version, (if your manufacturer allows this).

For further information see:

ArchWiki - Flashing BIOS from Linux

One tip that I forgot to mention earlier, (as it is so basic it is often overlooked) is to be sure to reboot your computer before opening a help request. People often forget to reboot after performing an update which can lead to all manner of strange performance issues. Linux can usually go a very long time without requiring a reboot. However, sometimes it can really help to correct strange performance issues. If you are running KDE you may get away with simply restarting plasma if you start to notice system lag. Although it is usually best to perform a full system restart if you are noticing issues and you haven’t rebooted in a while. As I already stated, this troubleshooting step is so basic that it is often overlooked, so when in doubt try a reboot.

tbg · 12 March 2022 06:36

Hardware Troubleshooting:

If the preceding preliminary steps proved fruitless, then it is time to run diagnostic tests to evaluate your hardware’s health. You should also perform stress testing to assess your system’s stability. Another very important step is to test your system with various live boot disks. Use live boot disks from distros such as Ubuntu, Linux Mint, or others, (that aren’t Arch derivatives). If the freezing also occurs in the live environments (or Windows) , then this definitely points to a hardware problem. If this is the case, in depth hardware troubleshooting tests will likely be required to eliminate individual hardware components from the equation.

ArchWiki information on stress testing your CPU & RAM.

Install lm-sensors to monitor your temperatures and fan speeds.

Install smartmontools to run diagnostics on the health of your drives.

After running your software diagnostic tests you may still need to get your hands dirty inside your computer, as these type of tests are not 100% reliable. If you live in an area prone to static electricity buildup, then use an anti static wrist band while working inside your computer.

Some of the troubleshooting steps recommended below will not be suitable for laptop computers as their internal components are not as readily accessible or replaceable. Some unscrupulous manufacturers of portable computing devices are now in the nasty habit of permanently soldering their components directly to the mainboard. Sadly, with these types of devices you have limited options if you have a faulty hardware component and your device is no longer under warranty. Fortunately with desktop computers there are many more options available for users comfortable working with their computer’s internal components.

Overheating:

Caution:

Keeping a desktop computer in enclosed space like a cabinet with little airflow can contribute to overly high computer temperatures. Likewise, using a laptop in bed with blankets blocking the air intake ports can also lead to excessively high temperatures. Temperatures in excess of 80°deg C could be the cause of random freezing.

One of the most common causes of system freezes is the overheating of hardware components. Overheating can not only cause freezes, it can also lead to premature hardware failure. Therefore, you need to ensure your system temps are within a safe operating range. You can check your hardware temperature and fan speeds using lm-sensors, or alternately look in your BIOS to check your values. If your CPU is running in excess of 75 deg C you should be concerned. if you’re system is running at 80+ deg C then the lifespan of your components are likely going to be reduced substantially. At 95+ deg C your components are probably going to be damaged. Once the temperature reaches 100 deg C it is almost certain your components will suffer damage. Your computer’s failsafe should hopefully shut down your computer before it can ever reach these dangerously high levels.

If your temperatures are consistently elevated, you will need to clean all exhaust ports and filters, power supply vents, heat sinks, and fans inside your computer. This is difficult on a laptop because of accessibility issues, so you will need to use compressed air to try and clean out accumulated dust as best you can. If your CPU temps are still higher than recommended after cleaning and this is a desktop computer you have more options available to lower your internal temperatures. Sometimes simply shuffling internal components and cabling around can improve internal airflow considerably.

If any of your computer fans are making noises upon startup, now is the time to replace them while you have your computer already apart. There is no point in waiting, as once the fan bearings start making noise, it won’t be too long before they seize up. If you do suffer a fan failure, then your temps could rise to dangerous levels. If space permits, adding extra (or larger) fans or watercooling is the most effective way to lower your CPU temp in a desktop computer.

If your CPU temperature is still too high after your best efforts to reduce it, you may need to reseat your CPU heatsink with new thermal paste. This is not difficult to do, but it is not for the faint of heart. Research this procedure carefully before attempting this, or refer this job to a knowledgeable computer expert.

Ram:

If software testing of your RAM did not determine it to be faulty, then it is still best to do further RAM troubleshooting. This is best practice, because sometimes RAM can pass software testing, but can still crash your system. If you are a heavy multi-tasker and you only have 4 GB of RAM, your freezes may simply be a result of insufficient RAM. While 4GB of RAM may meet Garuda’s minimum requirements, your system may not run very smoothly if using one of the distros heavier DE’s such as KDE Dragonized.

To troubleshoot your RAM further, shut down the computer and remove/disconnect all power sources. Then carefully remove all RAM sticks from the motherboard, (after opening their locking tabs). Only handle the RAM modules by their edges, definitely avoid touching the contacts with your finger tips. If the internals of you computer were quite dirty your ram should be carefully cleaned of dust. Your contacts might benefit from a light cleaning with rubbing alcohol with a lint free cloth if your computer was extremely dusty inside.

After removal and cleaning reinsert all RAM sticks again. Make extra certain that they are all seated correctly and locked securely in place. Restart, then try to ascertain if the freezes are still occurring after re-seating all RAM sticks. Be sure to double check in your BIOS that your RAM’s voltage and timings are set correctly according to the manufacturer’s recommendations.

If the freezes were still occurring when all RAM sticks were reinstalled correctly, then it is time to whittle down the chances that any one RAM stick is actually defective. This can be accomplished through a process of elimination. Remove all ram sticks except one from the motherboard. Make sure the single remaining stick of RAM is seated correctly in the primary slot. Cycle through testing each individual ram stick one at a time, allowing sufficient time to see if the freezes continue or end. If the computer only freezes with one particular RAM stick, then you have likely identified the cause. If the freezes occur with all sticks then you must continue testing other hardware to eliminate hardware components as a cause.

Be sure to double check your manufacturers documentation that all of your RAM sticks are of the correct type recommended for use with your motherboard, (matching pairs is best). This is very important if you bought the computer used or it was gifted to you, as you have no idea if the previous owner simply threw whatever RAM they had on hand in the box without checking on compatibilities.

CPU:

Your CPU is a highly sensitive piece of electronic hardware and problems may ensue if your CPU timings aren’t set just right, or its operating temperature exceeds the recommended levels. Overclocking or undervolting your CPU in Linux could cause major issues as Linux may not respond the same way to similar modifications used in Windows. Best keep your CPU clock/voltage settings as per the manufacturers recommendations or you could experience severe side effects. You may also want to test changing C-States (CPU States) that adjust the CPU power saving modes in your BIOS. C-States can alter CPU voltage and clocking to save on power.

Stress-test your CPU for an indication if it is related to your issue. You can install a utility such as stress or linpack (for Intel CPU) to give your system a serious working over.

Graphics adapter(s):

Visual stuttering, glitches and onscreen artifacts are all signs that your video card may be malfunctioning or the driver may be incorrectly configured. The graphics card (or the video driver) are one of the most prevalent causes of freezes occurring. If using an add in graphics adapter, be sure your graphics card is seated properly and securely locked in place. Also be sure that any secondary power connections are securely attached. if your video freezes, but your audio still plays then this is indicative of a possible graphics driver issue. Is the cooling fan, (or fans) on your graphics card functioning adequately enough to keep your GPU temperature at a reasonable level?

If using the proprietary Nvidia driver, try switching to the open source Nvidia driver, and vice versa. Try testing an alternate GPU if possible, or you could try testing your graphics card in another computer. If you happen to have onboard graphics, switch over to test the onboard video. If you are running an Nvidia adapter and you have, or could possibly borrow an AMD graphics card this would be very useful. Swapping Nvidia with an AMD card could eliminate both the Nvidia card and the graphics driver from the suspect list at once. Unfortunately, not that many people have an extra graphics card available to eliminate that from the list of possibilities.

Storage drives:

Ensure that all your storage drives have been updated to their latest firmware version. Be sure your drives, (especially mechanical drives) aren’t operating at an excessive temperature, as this can substantially lessen their lifespan. Traditional mechanical harddrives are much slower than an SSD. If your system is older and it feels laggy or sluggish you should definitely consider upgrading from a HDD to an SSD. Any clicking sound coming from the insides of your system may be a warning sign of an impending mechanical hard drive failure.

Run a S.M.A.R.T. diagnosis on your drives health, then carefully check over the detailed report on your drive(s) status. A SMART test failure would be a good indicator that your drive could be responsible for your freezing issues. However, passing a SMART test is not conclusive proof that a drive is not responsible. It is possible that software testing might not identify problematic hardware with 100% accuracy. I have encountered several hard drives that caused lockups in the past even though they passed SMART testing. To fully eliminate the possibility of dubious test results, it is best to disconnect any attached drives.

In addition be sure to check that your system drive is not running out of free space as this can also cause serious issues. To scan for errors in the file system of a BTRFS drive use btrfs-check. To scan for errors in the file system of an ext4 drive use fsck. Drives with Windows based file systems are best scanned from within Windows, or formatted to a Linux native file system for best compatibility.

Power supply:

A faulty power supply is another common cause of system instability. Unfortunately, there are no software tests available to diagnose PSU issues. It is possible to use a multimeter to test the PSU for voltage fluctuations. However, this is not something the average person is equipped to do. The surest way to find out if the PSU is faulty is to swap out the power supply, (if you have another computer, or a spare PSU). Be sure to clean the vents and fan on the PSU with compressed air on a regular basis to avoid problems with the PSU. Also make sure your PSU has a sufficient power rating to run all the components you may have added into your computer since you purchased it. You can use an online Power Supply Calculator to determine if your PSU is insufficient for your current hardware.

Motherboard:

Diagnosing a motherboard as faulty is extremely time consuming and difficult because you have to rule out all other hardware to make that determination. Luckily a faulty motherboard is relatively rare, but it is still possible that it could be the cause of your issues. Motherboards are generally considered one of the more reliable computer components, but especially If you have been over-clocking you might want to consider the possibility your motherboard is failing.

While you are inside your computer case it would be a good idea to closely inspect your motherboard for any visible abnormalities, defects, or damage. Check your motherboard for any cracks on the printed circuit board, or bulging or leaking capacitors. If you find any defective capacitors, then your motherboard is likely reaching its end of days. While it is possible to replace a capacitor, this requires top notch soldering skills probably beyond the average persons abilities.

Unfortunately, if it is not visually apparent that your mobo is defective it is very difficult to know for certain if your motherboard is the cause of your system instability. If some of your hardware is not being recognized this can be symptomatic of a faulty motherboard. If you can rule out almost all other hardware components using a live disk (as described below), then the mobo is a top suspect. To definitively rule out the motherboard, you would need to have a complete second set of components, (CPU, RAM, HDD, and PSU) to swap in. Of course, if your motherboard is still under warranty you should hopefully be able to RMA it to get a replacement.

Hardware troubleshooting using a live disk:

Rather than relying on software testing, you can more reliably discount hardware as a factor through a process of elimination. Power off, and then disconnect the computers power plug, (and battery, if equipped). Disconnect any secondary monitors, if you use more than one. Remove or disconnect as much internal hardware as possible. Disconnect any HDD, SSD, MMC reader, or optical drives. Remove any add in cards from their motherboard slots. This includes an add in GPU if you have onboard graphics, (switch to onboard in bios). Disable any devices such as onboard network adapters, onboard sound, parallel ports, and Firewire, or any other non-essential hardware that can be switched off via the BIOS. Leave only one RAM stick inserted in the primary socket. Replace any wireless keyboards and mice with USB or PS/2 versions during troubleshooting.

After disconnecting, removing, or disabling as much hardware as possible, boot from a live disk and run some stress tests to check for lockups. You can eliminate hardware possibilities by gradually adding back components one piece at a time. Always make sure that the computer’s power cord is unplugged before reinstalling any component. If the issue returns after adding back any particular piece of hardware that was removed you have likely found your culprit. However, as you add back more and more components this also increases the electrical load on your PSU. This means there is still the possibility a weak/flaky PSU could be part of your problem. The only way to know for sure is to swap suspect components with known good working hardware.

Other hardware troubleshooting recommendations:

Check that all internal cables are seated properly and undamaged. If a cable is getting old and/or the connection to the socket seems loose, it might be a good idea to replace it with a new one.
Also check all external cables for damage or a loose/sloppy fitting connection. Replace suspect cables with alternates if available. Cables running under carpet can be compromised if crushed. Pets also like to chew cables and this can cause lockups.
Disconnect all other monitors except your main monitor. Try a different type of cable if your hardware supports using multiple standards such as DVI, HDMI, or Display Port.
Disconnect all peripherals such as USB hubs, USB Hard drives, printers, web cams, etc, etc, as a test to see if the freezes still occur.
Reboot into your bios and (if possible) disable your Ethernet and WiFi in your bios temporarily as a test. Also disable any non-essential hardware that can be shut down via the bios.
If you are using a wireless keyboard or mouse, try to replace them with wired versions for troubleshooting purposes.

tbg · 12 March 2022 06:36

Disable BTRFS Quota (qgroups)

Garuda and other distributions have seen reports of system slowdowns and freezes happening with btrfs quota's enabled. Disabling btrfs quotas would seem a logical step if you experience freezes during BTRFS maintenance operations. if you seemingly experience a freeze during a balancing operation, try waiting as long as possible to hopefully let things resolve on their own. Balancing operations can sometimes take a very long time to complete. Never shut down your computer when a balancing operation is in progress.

To disable BTRFS quotas run:

sudo btrfs quota disable /

Disabling qgroups will impact the systems ability to gauge the remaining disk space left for creating snapshots. If you disable qgroups you must be vigilant in ensuring you have adequate free space left for new snapshots. You do not want auto-snapshots to result in a completely filled drive, as this is a serious issue that you do not want to occur. Even though this requires more manual scrutiny on the users part, for systems that are severely impacted by freezes this seems an adequate trade off. You must be the judge of if the benefits of having qgroups enabled outweighs any negative side-effects you are experiencing.

Information on BTRFS quota support.

There has been some discussion about disabling qgroups by default on Garuda. At the time of writing, I believe BTRFS quotas are still enabled in all editions. As it seems that only a small minority of systems are affected by this issue, (and qgroups are a useful feature) BTRFS quotas may remain the default.

Edit:

Since switching to using snapper from timeshift BTRFS quotas are no longer enabled by default. If you are still using timeshift for your system snapshots, then BTRFS quotas are likely enabled on your system.

It has been reported that some updates may re-enable qroups even though they were manually disabled. Therefore, you will need to check if quotas have been re-enabled if you begin experiencing the same issues again.

Read the link below for information on how to permanently disable qgroups if using timeshift for creating snapshots:

BTRFS quota is automatically re-enabled if I disable it

BTRFS Balancing Tips:

Garuda uses the BTRFS filesystem which is quite different from the old standard ext4 used by many distros. Unlike ext4, BTRFS requires regular maintenance to be performed. Services are employed to perform these maintenance tasks on a regular schedule. There are however times when the user may want to perform some tasks manually.

It has been reported that if many snapshots have been created, lags and freezes may occur on some systems. If you are experiencing lags or freezing issues it may be helpful to disable BTRFS quotas and then delete all stored snapshots. After making these changes and performing a BTRFS balance performance is sometimes improved quite substantially. I personally usually manually delete all my snapshots and perform a BTRFS balance after I have accumulated 5 or more snapshots.

After doing a very large update or deleting large amounts of data, performance degradation may occur on some systems. After those operations it is often beneficial to perform a BTRFS balance to ensure your performance does not suffer. Numerous people have reported dramatic improvements in performance after performing a BTRFS balancing as some systems seem to require this more than others. Be sure to reboot after the balancing is complete. Also be sure to create a new system snapshot after you've completed all those operations.

The command below will launch a 60% balancing operation on / (root) and will also provide updates on how far along the process is to completion:

bash -c "sudo btrfs balance start -musage=60 -dusage=60 / & sudo watch -t -n5 btrfs balance status / &&  fg"

BTRFS balancing operations can sometimes take a very long time to complete. Never shut down your computer when a balancing operation is in progress.

tbg · 13 March 2022 10:03

Test various kernel parameters:

Many times the surest fix to correcting an issue that creates an unresponsive system involves the kernel. If changing the kernel itself does not fix your issue, there is a fairly good chance that changing some of the kernel parameters loaded at boot time could help. Garuda uses grub as its boot loader. Kernel parameters can be set temporarily by editing the boot entry at the grub boot selection menu. Kernel parameters can be set permanently by editing grub’s configuration file at /etc/default/grub. Be sure to make a backup before editing the grub configuration file.

There are many kernel parameters and only a select few may possibly help correct issues with system crashes on your particular hardware. Unfortunately, many kernel parameters are specific to only a certain type of hardware and there is no reference database to locate exactly which kernel parameter(s) may be required for your hardware. Generally, it is very hard to recommend exactly which kernel parameters to test as there are so many parameters and multitudes of varied hardware. It usually requires a lot of searching and trial and error to find parameters that may help with your issue.

Try using variations of search terms similar to below to locate pertinent info:

Arch Linux fix freezes kernel parameter “your motherboard model”

Or:

Arch Linux fix freezes kernel parameter “your laptop model”

tbg · 13 March 2022 21:11

Test a Different I/O Scheduler:

Sometimes a kernel change alone will not resolve some stubborn freezing problems. For those that have tested multiple different kernels and are still experiencing freezes, it is a good idea to also test out different I/O schedulers. Some kernel versions, (such as cacule) come preconfigured with different I/O schedulers, otherwise you must manually change schedulers yourself.

Try monitoring your disk I/O activity with the iotop utility. Excessive I/O activity can lead to system slowdowns or freezes. If this appears to be happening you might want to try changing your I/O scheduler. This is an especially worthwhile troubleshooting step if you find any I/O errors in your logs. Test out different schedulers to see if there is any performance improvement.

To identify the scheduler in use for all drives, run:

grep . /sys/class/block/*/queue/scheduler

Switching the default scheduler in use may seem a little daunting if you've never attempted it before, however it is actually quite simple.

For further information, reference the ArchWiki:

Input/Output schedulers

Changing the I/O scheduler

Tuning the I/O scheduler

Storage I/O scheduling with ionice

You may also want to investigate related sysctl tuning parameters:

Sysctl - Virtual memory

Kernel docs - sysctl vitual memory

tbg · 14 March 2022 02:47

Troubleshoot Garuda’s performance tuning packages:

Edit:

Most of this section is now outdated, as Garuda Assistant now allows the user to select which tuning packages they wish to enable/disable through the GUI.

To test if any of the Garuda’s performance tuning enhancements are causing issues on your system you may want to try disabling/masking some of these services one at a time. The performance tuning packages Garuda has installed by default have changed over time. Depending on how old your install is, you could have a few of the older packages no longer used installed and running on your system. You can mask any installed service to determine if it is causing issues on your system.

You can find out if any of these services are installed and running on your system with the following command:

systemctl status ananicy-cpp irqbalance preload prelockd auto-cpufreq  memavaild

To stop/disable/mask any individual service that is running on your system, execute:

sudo systemctl disable --now ananicy-cpp.service && sudo systemctl mask ananicy-cpp.service && sudo systemctl daemon-reload

sudo systemctl disable --now irqbalance.service && sudo systemctl mask irqbalance.service && sudo systemctl daemon-reload

sudo systemctl disable --now  preload.service && sudo systemctl mask preload.service && sudo systemctl daemon-reload

sudo systemctl disable --now prelockd.service && sudo systemctl mask prelockd.service && sudo systemctl daemon-reload

sudo systemctl disable --now auto-cpufreq.service && sudo systemctl mask auto-cpufreq.service && sudo systemctl daemon-reload

sudo systemctl disable --now memavaild.service && sudo systemctl mask memavaild.service && sudo systemctl daemon-reload

The service’s state should be automatically refreshed by the included sudo systemctl daemon-reload command.

After testing the results of your systems performance with a service masked, the service can be easily be made operational again if you wish. To reinitialize any of the service(s) you masked, repeat the above command(s) substituting “unmask” in place of “mask” and “enable” in place of “disable”, as in the examples below:

sudo systemctl unmask ananicy-cpp.service && sudo systemctl enable --now ananicy-cpp.service && sudo systemctl daemon-reload

sudo systemctl unmask irqbalance.service && sudo systemctl enable --now irqbalance.service && sudo systemctl daemon-reload

sudo systemctl unmask preload.service && sudo systemctl enable --now preload.service && sudo systemctl daemon-reload

In some instances you may need to reboot to fully initialize the service, as simply reloading may not be sufficient in all cases.

tbg · 14 March 2022 03:25

Troubleshooting Installed Packages:

Always be sure to note the exact time and date when a performance issue first appears. Knowing the date of the first occurrence can be crucial in helping to narrow down the package(s) responsible for causing the problem. If you update on a daily basis and know when the issue first started finding the offending package should be relatively simple. This is because the list of newly installed packages should be small, and therefore far easier to troubleshoot. If you update frequently, you should have a reasonably short list of packages to investigate. if you haven’t updated for a month the package list will be massive, and the package responsible will be far more difficult to track down.

Software troubleshooting often involves removing or downgrading package installations that coincided with the time your issue first started to appear. Check your /var/log/pacman.log for packages installed just before the time that your issue first presented. The following command will help make it easier to identify the list of potential problem packages, (if you know the exact date the update was performed):

cat /var/log/pacman.log | grep '2023-11-30' | grep -E 'installed|upgraded|removed

You must substitute the date you performed the problematic update for the date of Nov 30th, 2023 in the above command.

Drivers or firmware packages updated around the time the problem began are prime candidates for causing major issues. Downgrade any driver or firmware packages that were updated recently to an older version. The video driver not being suitable for the video card in use is one of the most common causes of display abnormalities. Selectively downgrade one package at a time, (starting with the most likely first). Hopefully through the process of elimination you can identify fairly quickly exactly which package update was responsible for creating your issue. It is best to reboot after downgrading a package, (especially if a driver or firmware downgrade was performed).

If you you have a fairly extensive list of packages that you suspect, be sure to search thoroughly online for recent bug reports that sound similar to what you are experiencing. If you identify a package that may be problematic, then test your suspicion out by downgrading that package to the previous version. If a downgrade solves your problem, then you will need to hold the package version to prevent the problematic version from being automatically reinstalled during your next update. Any bug of a serious nature you discover should be reported to the upstream project to hopefully get a quick resolution. Hold the offending package at the old version for a week or two and then again test out the newer version to see if a fix has been applied.

When experiencing performance issues, It can help to keep a system monitoring utility such as htop continuously running to keep an eye on your resource usage. Hopefully this can help identify if any specific program or process may be contributing to your issue. Track how much memory is in use, and try to identify any process that exhibits constantly rising memory usage. This is indicative of a program with a memory leak that will eventually crash your system if it not prevented from consuming all your available RAM. If the offending package/program is non-essential, stop using it, or uninstall it. If you just can’t do without this problematic software, you will need to file a bug report to help the upstream software developers resolve the issue in a timely manner.

An often overlooked source of system instability is the addition of Gnome extensions or Plasma widgets. If you have added any extensions or widgets then be sure to remove these addons to eliminate them from the list of possibilities. I once had a memory leak that totally baffled me as the leaking process was not identifiable with a monitoring utility. I’d almost given up on identifying the leak when I remembered I’d installed a few widgets a while back. As soon I removed these recently added widgets my problem went away, so don’t forget about these tiny programs as they have the potential to cause big problems.

A user requested I add Baloo (KDE’s search engine) to the list of problematic software. Baloo can at times consume excessive amounts of CPU or RAM. The Garuda defaults for Baloo were changed quite some time ago, so this hasn’t been much of a problem of late. However, if you enable full content indexing of all drives (especially very large storage drives) this can impact performance quite dramatically at times. If your machine is older with limited RAM Baloo can at times almost grind your system to a halt during heavy indexing operations. KDE will not permit uninstallation of Baloo, but it can be disabled so that it will not negatively impact your performance. See the KDE Baloo troubleshooting guide for further details:

Baloo - Debugging

Full system reinstallation (in stages):

If your issue only started recently and was not present when testing in a live boot environment, then a recent update to a program, driver, firmware, bios, or kernel has likely caused this. If your issue started a while ago, but you have no idea exactly when it began, a reinstallation may help identify the package(s) causing your problems. If you only just installed your system, then following the instructions below would likely be profitless.

If you feel a reinstallation of Garuda is warranted, then perform the installation in stages to help narrow down any package(s) possibly causing your issue. First perform an offline install with no internet connection. Once your installation is complete, do not install any extra packages or update your system just yet. Stress your system to determine with 100% certainty if the freezes are no longer happening. Once this has been determined, do your system updates.

If the freezes start again after updating, (when good before), then you will need to try and figure out which package(s) in the update list are the trigger. Be sure to allow plenty of time after each stage to ascertain whether your freezing issue returns. If there are still no freezes after updating your system, then start slowly installing your normal software only one package at at a time, (AUR packages last). Stress your system after each new package installation to determine if freezing occurs. If the system freezes are triggered after installing a particular package then you have likely located the cause of your problem.

Further software related suggestions:

Try disabling hardware acceleration in your web browser(s), as this has been known to cause freezing with some hardware.

Try creating a new user account, If the new user account is problem free, then the issue originates in your original user’s home settings.

Power saving software such as tlp can sometimes contribute to freezing, so you may want to temporarily disable or uninstall it.

Uninstall or disable all Plasma widgets (plasmoids) that you installed yourself.

Uninstall or disable all Gnome extensions you’ve installed yourself?

Using an aggressive system cleaner such as Bleach Bit can result in system instability if great care is not taken during cleaning operations.

Firefox has been well known for issues relating to memory leaks or causing freezes in the past. Install and use an alternate browser to see if your system stability improves.

Questions relating to your installed software:

Is your system fully updated?

Have you tried resetting your default configs with Garuda Assistant?

Have you tried creating a new user account?

Did you install any packages from the AUR around the time when the issue began?

Did you notice any driver package updates around the time when your issue began?

Was there a major update to your Desktop Environment about the same time your issue began?

Are all your packages from the AUR fully updated?

Have you checked upstream for any outstanding bug reports in any software you think could be responsible for your issue?X11 vs Wayland:

I have recently seen an increasing trend that I thought I’d better add to my list of causes for poor system performance. As installation and usage of Wayland has become more commonplace, reports of poor system performance have also increased. The Wayland display server has made great strides, and will likely become the default windowing system on many Linux machines in the near future. However, Wayland still has its share of bugs and although much improved it still causes issues on some systems, (more-so if using the KDE desktop).

The following issue has affected systems that are using Nvidia graphics adapters on KDE installs running Wayland. are missing the kernel parameter for Direct Rendering Manager (DRM)
nvidia-drm.modeset=1
Needs to be added for Nvidia in a Wayland session to work correctly.

Therefore, my suggestion would be to switch to an X11 session if encountering any performance issues when using Wayland, (if Wayland is your default). Conversely, switching to a Wayland session may in some cases of poor system performance result in a great improvement. To sum things up, make sure you test out both types of sessions if experiencing performance issues using either Wayland or X11.

tbg · 14 March 2022 06:35

Feedback you should provide:

There are numerous threads on the forum dealing with freezing issues with many different suggestions posted on how to hopefully correct the issue. Please search the forum and report in detail on every fix you attempt and post relevant logs and command outputs. To troubleshoot any issue effectively forum assistants must know all the troubleshooting steps that have been tested to have any chance of finding a solution. Threads already covering this topic on the forum should provide plenty of information on the steps you need to take to troubleshoot this issue.

If a thorough search of the Garuda forum doesn't turn up a solution, then searching other Arch based forums is usually the next step. If you can't turn up what you need on the Arch derivative distro foras then throw a wider net with an internet wide search.

Freezing Questionnaire/Checklist:

Have you posted the output of the garuda-inxi command?

Have you provided a full history of fixes attempted?

Have you checked for errors/segfaults/crash dumps, and posted your logs?

Have you given at least 3 alternate kernels a test out? which ones?

Have you fully updated your system?

Is your BIOS up to date?

Have you checked your resource usage with htop, iotop, etc?

Are you getting close to maxing out your ram at any time?

Have you checked your system temperatures?

Did this issue start recently after an update?

If this started recently, have you tried performing a rollback via a snapshot?

Can you recall making any config changes about when this issue began?

Have you tried disabling the baloo file indexer temporarily?

Have you tried disabling all network adapters temporarily?

Have you tried disabling hardware acceleration in your browser?

Have you tried disabling BTRFS Quotas (qgroups)?

Have you run a BTRFS balancing operation?

If you press the CAPS or NUMLOCK key, does your KB state light change?

Is your CAPS or NUMLOCK LED blinking? (kernel panic indicator)

Can you move your mouse cursor?

Can you move your mouse cursor, but clicking has no effect?

Do you have full keyboard functionality?

Is this a complete freeze up with no keyboard or mouse responsiveness?

Does pressing CTRL+T open a terminal?

Does pressing CTRL+ALT+F2 get you to a TTY?

Have you tried restarting your system from the terminal or TTY?

Have you tried to remote in from another computer via ssh?

Have you tried to ping your machine from another computer?

Can you use the Magic SysRq key to restart/shutdown?

Have you tried restarting (KDE) plasmashell or kwin from the terminal?

Is there a specific program or action that often triggers a freeze?

How many applications are generally running when the freezing occurs?

Are the freezes completely random?

Is their any pattern to the freezes?

How often do the freezes usually occur?

How long have these freezes been happening on this install?

Do the freezes seem to be getting more frequent over time?

What is your longest time without a freeze?

Do freezes only happen while doing something CPU intensive?

Do the freezes happen even if the system is completely idle?

If a freeze does occur, does it resolve on its own if you wait a long time?

Have you tried both the proprietary and free Nvidia drivers, (if present)?

Have you tried changing your compositor settings?

Have you tried disabling your compositor entirely?

Have you followed all the steps in the hardware troubleshooting tutorial?

Have you followed all the steps in the software troubleshooting tutorial?

Have you tried removing all plasma widgets (plasmoids) you've installed?

Have you tried removing all Gnome extensions you've installed?

Have you used Garuda Assistant to reset your default config?

Have you tried Installing linux-firmware-git , and rebooting?

Have you made any overclocking/undervolting or similar modifications?

Do you use full disk encryption?

Do you use a swap partition?

Have you installed multiple Desktop Environments or WM's together?

Are you running Garuda in a Virtual Machine?

Do you have only 4GB of RAM (or less) with shared video?

Does your electrical grid experience power fluctuations?

Is your home wiring very old, do your lights flicker?

Have you installed any extra apps (which) from the AUR/Chaotic repos?

Have you fully updated all apps from the AUR/Chaotic repos?

Have you experienced similar issues with this hardware on other OS's?

Did, (or does) freezing also occur on Windows?

Did, (or does) freezing also occur on other Linux distros?

Have you booted live disks of other Garuda DE's or other distros?

Does freezing also occur in live environments, (which ones)?

Does freezing also occur if you create a new user account?

Does the system get progressively slower before freezing?

Has your computer ever shut off on its own, without you initiating ?

Does sound continue playing, or loop during a freeze?

Are you using tlp ? if so, disable or uninstall it temporarily.

If your computer has a disk activity LED, is it blinking during the freeze?

Please answer as many of the above questions as possible. We need to have a complete picture in order to help resolve your issue. You can copy the above list of questions in its entirety into any help request you open on this subject. Below each question provide an answer as best you can to each question on the checklist of queries.

tbg · 19 March 2022 14:57

X11 vs Wayland ?

I have recently noticed a trend that I thought I’d better add to my list of causes for poor system performance/instability. As the usage of Wayland becomes more commonplace, so unfortunately have reports of many problems related to Wayland. Wayland has made great strides and will likely become the default display server on most Linux machines in the near future. Sadly though, even as we approach 2024 Wayland is still having its fair share of teething pains. Although much improved, Wayland is still causing major issues on some systems. I’m not trying paint Wayland in a bad light, most users rate the experience as a positive one, and many report much improved performance.

Unfortunately KDE users are still reporting a myriad of negative side effects when attempting to transition to Wayland. Recently quite a few KDE users have reported being unable to login under Wayland, but logging in under an X session is fine. Other KDE users have recently reported frame drops/stutters and 70 - 90% CPU usage with kwin_wayland. As well, some KDE users have been unable to get their secondary monitors recognized or working under Wayland. Nvidia users on KDE with Wayland have been experiencing a lot of graphic problems as well. The most common fix has been to enable DRM (Direct Rendering Manager) for Nvidia with the kernel parameter nvidia-drm.modeset=1. Enabling DRM (Direct Rendering Manager) is covered at NVIDIA - ArchWiki .

If you are uncomfortable editing grub’s configuration file, then you can use the command below to automatically add the nvidia-drm.modeset=1 kernel parameter to /etc/default/grub:

sudo cp /etc/default/grub /etc/default/grub.bak && sudo sed '/^GRUB_CMDLINE_LINUX_DEFAULT=/s/"$/ nvidia-drm.modeset=1 "/g' -i /etc/default/grub && sudo update-grub

You may need to press Enter again at the update-grub stage of the above string of commands. The above command will firstly backup your Grub config before editing. Next, it will add the nvidia-drm.modeset=1 kernel parameter to the /etc/default/grub config file. Lastly, it will execute the update-grub command to regenerate grub. It can take a fair bit of time to regenerate grub, so don’t worry if it takes a while to finish. When the process has completed, reboot your computer.

X11 has been in use since September 1987, so it is considered far more mature and stable than the newcomer Wayland. For this reason, severe bugs are usually encountered far less frequently with X11. However, X11’s days are definitely numbered as Wayland is faster /more secure and in heavy development by the big corporate Linux players. Regardless of X’s long standing track record, it can also create major system malfunctions at times. In some cases switching to a Wayland session from X11 may cure system instability/performance issues.

To sum things up, I would now consider it best practice to switch session types whenever experiencing any performance/stability issues on either Wayland or X11.

Edited to update recent developments:

tbg · 19 March 2022 14:58

If you have any further troubleshooting suggestions, feel free to add your comments below. However, please keep all comments strictly on topic, as this is an FAQ tutorial, not a discussion thread. Comments that do not contain useful suggestions to help troubleshoot performance related issues will be purged from this thread.

Please do not post any requests for assistance on this thread. Open your own help request if you are experiencing performance related problems.

magnus-ISU · 13 December 2022 20:17

I had a lot of lag due to the baloo indexer. It had ballooned its index size to 10GB. This is because index file contents was on in KDE settings.

Is that the default? I have to imagine the number of people using it is small, and maybe Garuda should just have it off on install. I also installed a long time ago, so its possible it has since been changed or even that I turned it on myself. But if it is default, I would consider changing it.

tbg · 19 November 2023 18:22

When I first wrote this tutorial the use of Wayland was not that common. As time has marched on Wayland is becoming more the norm on many systems. I have added an update to my tutorial discussing issues that can occur if using either Wayland or X11.

Anyone with further suggestions they feel should be included in this performance troubleshooting guide feel free to make suggestions (but please keep any posts in this tutorial thread on topic).

Kayo · 20 November 2023 23:49

Yes, I am seeing a lot of Wayland related posts lately too. A lot of systems that are using Nvidia are missing the kernel parameter for Direct Rendering Manager (DRM)
nvidia-drm.modeset=1
Needs to be added for Nvidia in a Wayland session to work correctly.

more info here:

Bro · 20 November 2023 23:56

Could not agree more with you two! As simple as Arch/KDE can get with a 9th Gen Intel/Intel rig, and I couldn’t wait to get out of Wayland after trying a session yesterday. I was just hoping it wasn’t causing system damage.

But it makes sense in a way; GNOME has gotten better Wayland support so far because it’s a Red Hat creature.

tbg · 21 November 2023 16:49

I have added your info on adding the required Nvidia kernel parameter in the new section on X11 vs Wayland. Thanks for pointing that out.

I also added a one liner to back up /etc/default/grub, edit the grub config file by adding the required Nvidia kernel parameter, and then lastly regenerate grub’s config.

I figured I’d make it a little easier for newbs to add the kernel param (all in one command) as those unfamiliar with the process often end up breaking their boot.

Kayo · 10 October 2024 08:44

Just wanted to add that nvidia-drm.modeset=1 is no longer needed on systems running nvidia-utils 560.35.03-5 or later. It still will be needed with older drivers however.
Sources:

NVIDIA - ArchWiki (under “1.2 DRM kernel mode setting”)

tbg · 10 October 2024 08:55

I updated the information I had posted earlier in the X11 vs Wayland section regarding the nvidia-drm.modeset=1 kernel param usage.

Thank you @Kayo for posting back here to keep everyone apprised of the recent nvidia driver deve[opments.