Hardware Troubleshooting:
If the preceding preliminary steps proved fruitless, then it is time to run diagnostic tests to evaluate your hardware’s health. You should also perform stress testing to assess your system’s stability. Another very important step is to test your system with various live boot disks. Use live boot disks from distros such as Ubuntu, Linux Mint, or others, (that aren’t Arch derivatives). If the freezing also occurs in the live environments (or Windows) , then this definitely points to a hardware problem. If this is the case, in depth hardware troubleshooting tests will likely be required to eliminate individual hardware components from the equation.
ArchWiki information on stress testing your CPU & RAM.
Install lm-sensors to monitor your temperatures and fan speeds.
Install smartmontools to run diagnostics on the health of your drives.
After running your software diagnostic tests you may still need to get your hands dirty inside your computer, as these type of tests are not 100% reliable. If you live in an area prone to static electricity buildup, then use an anti static wrist band while working inside your computer.
Some of the troubleshooting steps recommended below will not be suitable for laptop computers as their internal components are not as readily accessible or replaceable. Some unscrupulous manufacturers of portable computing devices are now in the nasty habit of permanently soldering their components directly to the mainboard. Sadly, with these types of devices you have limited options if you have a faulty hardware component and your device is no longer under warranty. Fortunately with desktop computers there are many more options available for users comfortable working with their computer’s internal components.
Overheating:
Caution:
Keeping a desktop computer in enclosed space like a cabinet with little airflow can contribute to overly high computer temperatures. Likewise, using a laptop in bed with blankets blocking the air intake ports can also lead to excessively high temperatures. Temperatures in excess of 80°deg C could be the cause of random freezing.
One of the most common causes of system freezes is the overheating of hardware components. Overheating can not only cause freezes, it can also lead to premature hardware failure. Therefore, you need to ensure your system temps are within a safe operating range. You can check your hardware temperature and fan speeds using lm-sensors, or alternately look in your BIOS to check your values. If your CPU is running in excess of 75 deg C you should be concerned. if you’re system is running at 80+ deg C then the lifespan of your components are likely going to be reduced substantially. At 95+ deg C your components are probably going to be damaged. Once the temperature reaches 100 deg C it is almost certain your components will suffer damage. Your computer’s failsafe should hopefully shut down your computer before it can ever reach these dangerously high levels.
If your temperatures are consistently elevated, you will need to clean all exhaust ports and filters, power supply vents, heat sinks, and fans inside your computer. This is difficult on a laptop because of accessibility issues, so you will need to use compressed air to try and clean out accumulated dust as best you can. If your CPU temps are still higher than recommended after cleaning and this is a desktop computer you have more options available to lower your internal temperatures. Sometimes simply shuffling internal components and cabling around can improve internal airflow considerably.
If any of your computer fans are making noises upon startup, now is the time to replace them while you have your computer already apart. There is no point in waiting, as once the fan bearings start making noise, it won’t be too long before they seize up. If you do suffer a fan failure, then your temps could rise to dangerous levels. If space permits, adding extra (or larger) fans or watercooling is the most effective way to lower your CPU temp in a desktop computer.
If your CPU temperature is still too high after your best efforts to reduce it, you may need to reseat your CPU heatsink with new thermal paste. This is not difficult to do, but it is not for the faint of heart. Research this procedure carefully before attempting this, or refer this job to a knowledgeable computer expert.
Ram:
If software testing of your RAM did not determine it to be faulty, then it is still best to do further RAM troubleshooting. This is best practice, because sometimes RAM can pass software testing, but can still crash your system. If you are a heavy multi-tasker and you only have 4 GB of RAM, your freezes may simply be a result of insufficient RAM. While 4GB of RAM may meet Garuda’s minimum requirements, your system may not run very smoothly if using one of the distros heavier DE’s such as KDE Dragonized.
To troubleshoot your RAM further, shut down the computer and remove/disconnect all power sources. Then carefully remove all RAM sticks from the motherboard, (after opening their locking tabs). Only handle the RAM modules by their edges, definitely avoid touching the contacts with your finger tips. If the internals of you computer were quite dirty your ram should be carefully cleaned of dust. Your contacts might benefit from a light cleaning with rubbing alcohol with a lint free cloth if your computer was extremely dusty inside.
After removal and cleaning reinsert all RAM sticks again. Make extra certain that they are all seated correctly and locked securely in place. Restart, then try to ascertain if the freezes are still occurring after re-seating all RAM sticks. Be sure to double check in your BIOS that your RAM’s voltage and timings are set correctly according to the manufacturer’s recommendations.
If the freezes were still occurring when all RAM sticks were reinstalled correctly, then it is time to whittle down the chances that any one RAM stick is actually defective. This can be accomplished through a process of elimination. Remove all ram sticks except one from the motherboard. Make sure the single remaining stick of RAM is seated correctly in the primary slot. Cycle through testing each individual ram stick one at a time, allowing sufficient time to see if the freezes continue or end. If the computer only freezes with one particular RAM stick, then you have likely identified the cause. If the freezes occur with all sticks then you must continue testing other hardware to eliminate hardware components as a cause.
Be sure to double check your manufacturers documentation that all of your RAM sticks are of the correct type recommended for use with your motherboard, (matching pairs is best). This is very important if you bought the computer used or it was gifted to you, as you have no idea if the previous owner simply threw whatever RAM they had on hand in the box without checking on compatibilities.
CPU:
Your CPU is a highly sensitive piece of electronic hardware and problems may ensue if your CPU timings aren’t set just right, or its operating temperature exceeds the recommended levels. Overclocking or undervolting your CPU in Linux could cause major issues as Linux may not respond the same way to similar modifications used in Windows. Best keep your CPU clock/voltage settings as per the manufacturers recommendations or you could experience severe side effects. You may also want to test changing C-States (CPU States) that adjust the CPU power saving modes in your BIOS. C-States can alter CPU voltage and clocking to save on power.
Stress-test your CPU for an indication if it is related to your issue. You can install a utility such as stress or linpack
(for Intel CPU) to give your system a serious working over.
Graphics adapter(s):
Visual stuttering, glitches and onscreen artifacts are all signs that your video card may be malfunctioning or the driver may be incorrectly configured. The graphics card (or the video driver) are one of the most prevalent causes of freezes occurring. If using an add in graphics adapter, be sure your graphics card is seated properly and securely locked in place. Also be sure that any secondary power connections are securely attached. if your video freezes, but your audio still plays then this is indicative of a possible graphics driver issue. Is the cooling fan, (or fans) on your graphics card functioning adequately enough to keep your GPU temperature at a reasonable level?
If using the proprietary Nvidia driver, try switching to the open source Nvidia driver, and vice versa. Try testing an alternate GPU if possible, or you could try testing your graphics card in another computer. If you happen to have onboard graphics, switch over to test the onboard video. If you are running an Nvidia adapter and you have, or could possibly borrow an AMD graphics card this would be very useful. Swapping Nvidia with an AMD card could eliminate both the Nvidia card and the graphics driver from the suspect list at once. Unfortunately, not that many people have an extra graphics card available to eliminate that from the list of possibilities.
Storage drives:
Ensure that all your storage drives have been updated to their latest firmware version. Be sure your drives, (especially mechanical drives) aren’t operating at an excessive temperature, as this can substantially lessen their lifespan. Traditional mechanical harddrives are much slower than an SSD. If your system is older and it feels laggy or sluggish you should definitely consider upgrading from a HDD to an SSD. Any clicking sound coming from the insides of your system may be a warning sign of an impending mechanical hard drive failure.
Run a S.M.A.R.T. diagnosis on your drives health, then carefully check over the detailed report on your drive(s) status. A SMART test failure would be a good indicator that your drive could be responsible for your freezing issues. However, passing a SMART test is not conclusive proof that a drive is not responsible. It is possible that software testing might not identify problematic hardware with 100% accuracy. I have encountered several hard drives that caused lockups in the past even though they passed SMART testing. To fully eliminate the possibility of dubious test results, it is best to disconnect any attached drives.
In addition be sure to check that your system drive is not running out of free space as this can also cause serious issues. To scan for errors in the file system of a BTRFS drive use btrfs-check. To scan for errors in the file system of an ext4 drive use fsck. Drives with Windows based file systems are best scanned from within Windows, or formatted to a Linux native file system for best compatibility.
Power supply:
A faulty power supply is another common cause of system instability. Unfortunately, there are no software tests available to diagnose PSU issues. It is possible to use a multimeter to test the PSU for voltage fluctuations. However, this is not something the average person is equipped to do. The surest way to find out if the PSU is faulty is to swap out the power supply, (if you have another computer, or a spare PSU). Be sure to clean the vents and fan on the PSU with compressed air on a regular basis to avoid problems with the PSU. Also make sure your PSU has a sufficient power rating to run all the components you may have added into your computer since you purchased it. You can use an online Power Supply Calculator to determine if your PSU is insufficient for your current hardware.
Motherboard:
Diagnosing a motherboard as faulty is extremely time consuming and difficult because you have to rule out all other hardware to make that determination. Luckily a faulty motherboard is relatively rare, but it is still possible that it could be the cause of your issues. Motherboards are generally considered one of the more reliable computer components, but especially If you have been over-clocking you might want to consider the possibility your motherboard is failing.
While you are inside your computer case it would be a good idea to closely inspect your motherboard for any visible abnormalities, defects, or damage. Check your motherboard for any cracks on the printed circuit board, or bulging or leaking capacitors. If you find any defective capacitors, then your motherboard is likely reaching its end of days. While it is possible to replace a capacitor, this requires top notch soldering skills probably beyond the average persons abilities.
Unfortunately, if it is not visually apparent that your mobo is defective it is very difficult to know for certain if your motherboard is the cause of your system instability. If some of your hardware is not being recognized this can be symptomatic of a faulty motherboard. If you can rule out almost all other hardware components using a live disk (as described below), then the mobo is a top suspect. To definitively rule out the motherboard, you would need to have a complete second set of components, (CPU, RAM, HDD, and PSU) to swap in. Of course, if your motherboard is still under warranty you should hopefully be able to RMA it to get a replacement.
Hardware troubleshooting using a live disk:
Rather than relying on software testing, you can more reliably discount hardware as a factor through a process of elimination. Power off, and then disconnect the computers power plug, (and battery, if equipped). Disconnect any secondary monitors, if you use more than one. Remove or disconnect as much internal hardware as possible. Disconnect any HDD, SSD, MMC reader, or optical drives. Remove any add in cards from their motherboard slots. This includes an add in GPU if you have onboard graphics, (switch to onboard in bios). Disable any devices such as onboard network adapters, onboard sound, parallel ports, and Firewire, or any other non-essential hardware that can be switched off via the BIOS. Leave only one RAM stick inserted in the primary socket. Replace any wireless keyboards and mice with USB or PS/2 versions during troubleshooting.
After disconnecting, removing, or disabling as much hardware as possible, boot from a live disk and run some stress tests to check for lockups. You can eliminate hardware possibilities by gradually adding back components one piece at a time. Always make sure that the computer’s power cord is unplugged before reinstalling any component. If the issue returns after adding back any particular piece of hardware that was removed you have likely found your culprit. However, as you add back more and more components this also increases the electrical load on your PSU. This means there is still the possibility a weak/flaky PSU could be part of your problem. The only way to know for sure is to swap suspect components with known good working hardware.
Other hardware troubleshooting recommendations:
-
Check that all internal cables are seated properly and undamaged. If a cable is getting old and/or the connection to the socket seems loose, it might be a good idea to replace it with a new one.
-
Also check all external cables for damage or a loose/sloppy fitting connection. Replace suspect cables with alternates if available. Cables running under carpet can be compromised if crushed. Pets also like to chew cables and this can cause lockups.
-
Disconnect all other monitors except your main monitor. Try a different type of cable if your hardware supports using multiple standards such as DVI, HDMI, or Display Port.
-
Disconnect all peripherals such as USB hubs, USB Hard drives, printers, web cams, etc, etc, as a test to see if the freezes still occur.
-
Reboot into your bios and (if possible) disable your Ethernet and WiFi in your bios temporarily as a test. Also disable any non-essential hardware that can be shut down via the bios.
-
If you are using a wireless keyboard or mouse, try to replace them with wired versions for troubleshooting purposes.