Linux hardware monitoring tools

Whether you’re a home user or a system/network administrator at a large site, monitoring your Linux system helps you in many ways – possibly more than you currently know.

For example, you have important work-related documents on your laptop and one fine day, the hard drive decides to die on you without even saying goodbye. Since most users don’t make backups, you’ll have to call your boss and tell him the latest financial reports are gone. Not nice. But if you used a regularly started (at boot or with cron) disk monitoring and reporting piece of software, like smartd for example, it will tell you when your drive(s) start to become weary. Between us, though, a hard drive may decide to go belly up without warning, so backup your data.

This tutorial will deal with everything related to system monitoring, whether it’s network, disk or temperature. This subject usually can form enough material for a book, but we will try to give you only the most important information in order to get you started, or, depending on experience, have all the info in one place. You are expected to know your hardware and have basic sysadmin skills, but regardless where you’re coming from, you’ll find something useful here.

In this tutorial you will learn:

  • How to monitor system temperature
  • How to monitor hard disk and input/output
  • How to use network monitoring tools
  • How to monitor system CPU, memory, and running processes
Linux hardware monitoring tools
Linux hardware monitoring tools
Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Any Linux system
Software N/A
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

Linux Hardware Monitoring Tools




The following sections will introduce tools for different jobs, such as the monitoring of temperature, hard drive status, network, system processes, etc. We’ll show you the commands to install these tools on Debian and Ubuntu based systems. For others following along, you should use your package manager to search for the proper package name, as sometimes they go by slightly different names.

Temperature monitoring

The best tool for the job, when it comes to monitoring your system’s temperatures, is a package called sensors. Some distros may come with the software already preinstalled. On other systems, you may need to install it. On Debian or a derivative you can simply execute:

$ sudo apt install lm-sensors

On OpenSUSE systems the package is named simply sensors, while on Fedora you can find it under the name lm_sensors. You can use the search function of your package manager to find sensors, since most distributions offer it.

Now, as long as you have relatively modern hardware, you will probably have temperature monitoring capability. If you use a desktop distribution, you will have hardware monitoring support enabled.

To get started monitoring your system’s temperatures, use the following command in terminal.

$ sudo sensors-detect

Here is a snippet of output that was produced on our test system:

k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp:   +32.0°C  
Core0 Temp:   +33.0°C  
Core1 Temp:   +29.0°C  
Core1 Temp:   +25.0°C  

nouveau-pci-0200
Adapter: PCI adapter
temp1:        +58.0°C  (high = +100.0°C, crit = +120.0°C)

Your BIOS might have (most do) a temperature failsafe option: if the temperature reaches a certain threshold, the system will shutdown in order to prevent damage to the hardware. On the other hand, while on a regular desktop the sensors command might not seem very useful, on server machines located maybe hundreds of kilometers away such a tool might make every difference in the world.

If you’re the administrator of such systems, we recommend you write a short script that will mail you hourly, for example, with reports and maybe statistics about system temperature.

Disk and I/O

In this part we will refer to hardware status monitoring first, then go to the I/O section which will deal with detection of bottlenecks, reads/writes and the like. Let’s start with how to get disk health reports from your hard drives.

S.M.A.R.T.

S.M.A.R.T., which stands for Self Monitoring Analysis and Reporting Technology, is a capability offered by modern hard drives that lets the administrator efficiently monitor disk health. The application to install is usually named smartmontools, which offers a systemd script for regular writing to syslog.

It can be installed with this command:

$ sudo apt install smartmontools

Its name is smartd and you can configure it by editing /etc/smartd.conf and configuring the disks to be monitored and when to be monitored. This suite of S.M.A.R.T. tools works on Linux, the BSDs, Solaris, Darwin and even OS/2.

Distributions offer graphical front ends to smartctl, the main application to use when you want to see how your drives are doing, but we will focus on the command line utility. One uses -a (all info) /dev/sda as an argument, for example, to get a detailed report on the status of the first drive installed on the system. Here’s what I get:



$ sudo smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD5000AAKS-00WWPA0
Serial Number:    WD-WCAYU6160626
LU WWN Device Id: 5 0014ee 158641699
Firmware Version: 01.03B01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Oct 19 19:01:08 2011 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[snip]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
3 Spin_Up_Time            0x0027   138   138   021    Pre-fail  Always       -       4083
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       369
5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4186
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       366
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       347
194 Temperature_Celsius     0x0022   105   098   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

What we can get from this output is, basically, that no errors are reported and that all values are within normal margins. When it comes to temperature, if you have a laptop and you see abnormally high values, consider cleaning the insides of your machine for better air flow.

The platters may get deformed because of excessive heat and you certainly don’t want that. If you use a desktop machine, you can get a hard drive cooler for a cheap price. Anyway, if your BIOS has that capability, when POSTing it will warn you if the drive is about to fail.

smartctl offers a suite of tests one can perform: you can select what test you want to run with the -t flag:

$ sudo smartctl -t long /dev/sda

Depending on the size of the disk and the test you chose, this operation can take quite some time. Some people recommend running tests when the system does not have any significant disk activity, others even recommend using a live CD. Of course these are common sense advices, but in the end all this depends on the situation. Please refer to the smartctl manual page for more useful command-line flags.

I/O

If you are working with computers that do lots of read/write operations, like a busy database server, for instance, you will need to check disk activity. Or you want to test the performance your disk(s) offer you, regardless of the purpose of the computer. For the first task we will use iostat, for the second one we’ll have a look at bonnie++. These are just two of the applications one can use, but they’re popular and do their job quite well, so I felt no need to look elsewhere.

If you don’t find iostat on your system, your distribution might have it included in the sysstat package, which offers lots of tools for the Linux administrator, and we’ll talk about them a little later.

Install it with the following command:

$ sudo apt install sysstat

You can run iostat with no arguments, which will give you something like this:

Linux 5.13.0-27-generic (linuxconfig) 	02/02/2022 	_x86_64_	(1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.10    2.22    8.89    1.67    0.00   70.11

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sda              42.29      1120.77       528.22         0.00    1115925     525937          0

If you want iostat to run continuously, just use -d (delay) and an integer:

$ iostat -d 1 10

This command will run iostat 10 times at a one second interval. Read the manual page for the rest of the options. It will be worth it, you’ll see. After looking at the flags available, one common iostat command may be like

$ iostat -d 1 -x -h 

Here -x stands for eXtended statistics and -h is from Human readable output.

bonnie++

bonnie++’s name (the incremented part) comes from its inheritance, the classic bonnie benchmarking program. It supports lots of hard disk and filesystem tests that stress the machine by writing/reading lots of files. It can be found on most Linux distributions exactly by that name: bonnie++. Here’s how to install it.

$ sudo apt install bonnie++

You can simply run the command with no extra options:

$ bonnie++

Here’s the output that we received from bonnie on our test machine.



Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
linuxconfig      4G  122k  96 69.5m  24  125m  55  446k  89  262m  48  3895 204
Latency               179ms     726ms   42985us   65391us   44430us   14077us
Version  1.98       ------Sequential Create------ --------Random Create--------
linuxconfig         -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     0  39 +++++ +++ 724249376  35     0  37 +++++ +++ 724249376  38
Latency             44098us   13586us   12148us   16141us   12212us   12207us
1.98,1.98,linuxconfig,1,1643851174,4G,,8192,5,122,96,71185,24,128266,55,446,89,268335,48,3895,204,16,,,,,7366,39,+++++,+++,11979,35,7939,37,+++++,+++,13108,38,179ms,726ms,42985us,65391us,44430us,14077us,44098us,13586us,12148us,16141us,12212us,12207us

Please bear in mind that running bonnie++ will stress your machine, so it’s a good idea to do this when the system isn’t as busy as usual. You can choose the output format (CSV, text, HTML), the destination directory or file size.

Again, read the manual, because these programs depend on the underlying hardware and its usage. Only you know best what you want to get from bonnie++.

Network monitoring

Before we start, you should know that we will not deal with network monitoring from a security standpoint, but from a performance and troubleshooting standpoint, although the tools are the same sometimes (wireshark, iptraf, etc.). When you’re getting a file with 10 kbps from the NFS server in the other building, you might think about checking your network for bottlenecks.

This is a large subject, since it depends on a plethora of factors, like hardware, cables, topology and so on. We will approach the matter in a unified way, meaning you will be shown how to install and how to use the tools, instead of classifying them and getting you all confused with unnecessary theory. We won’t include every tool ever written for Linux network monitoring, just what is considered important.

Before we start talking about complex tools, let’s start with the simple ones. Here, the trouble part from troubleshooting refers to network connectivity problems. Other tools, as you will see, refer to attack prevention tools. Again, only the subject of network security spawned many times, so this will be as short as it can be.

These simple tools are ping, traceroute, ip and friends. They are usually part of the inetutils or net-tools package (may vary depending on the distribution) and are very probably already installed on your system.

Also dnsutils is a package worth installing, as it contains popular applications like dig or nslookup. If you don’t already know what these commands do, we recommend you do some reading as they are essential to any Linux user, regardless of the purpose of the computer (s)he uses.

tcpdump

No such chapter in any network troubleshooting/monitoring guide will ever be complete without a part on tcpdump. It is a pretty complex and useful network monitoring tool, whether you’re on a small LAN or on a big corporate network.

What tcpdump does, basically, is packet monitoring, also known as packet sniffing. You will need root privileges in order to run it, because tcpdump needs the physical interface to run in promiscuous mode, which isn’t the default running mode of a Ethernet card. Promiscuous mode means that the NIC will get all traffic on the network, rather than only the traffic intended for it.

In case it’s not already installed, you can use the following command in terminal.

$ sudo apt install tcpdump

We will not try to replace tcpdump’s well written manual page, we’ll leave that to you. But before we go on, we recommend you learn some basic networking concepts in order to make sense of tcpdump, like TCP/UDP, payload, packet, header and so on.

One cool feature of tcpdump is the ability to practically capture web pages, done through using -A. Try starting tcpdump like so:

$ sudo tcpdump -vv -A

Now, go to a webpage. Then come back to the terminal window where tcpdump is executing. You’ll see many interesting things about that website, like what OS the webserver is running or what PHP version was used to create the page.

Use -i to specify the interface to listen on (like eth0, eth1, and so on) or -p for not using the NIC in promiscuous mode, useful in some situations. You can save the output to a file with -w $file if you need to check on it later (remember that the file will contain raw output). So an example of tcpdump usage based on what you read above would be:

$ sudo tcpdump -vv -A -i eth0 -w outputfile

We must remind you that this tool and others, like nmap, snort or wireshark, while they can be useful for monitoring your network for rogue applications and users, it can also be useful to rogue users. Please don’t use such tools for malicious purposes.

If you need a cooler interface to a sniffing/analyzing program, you might try iptraf (CLI) or wireshark (GTK). We will not discuss them in more detail, because the functionality they offer is similar to tcpdump. We recommend tcpdump, though, because it’s almost certain you’ll find it installed regardless of distribution, and it will give you the chance to learn.

netstat




netstat is another useful tool for live remote and local connections, which prints its output in a more organized, table-like manner. The name of the package will usually be simply net-tools and most distributions offer it.

To install the software:

$ sudo apt install net-tools

If you start netstat without arguments, it will print a list of open sockets and then exit. But since it’s a versatile tool, you can control what to see depending on what you need. First of all, -c will help you if you need continuous output, similar to tcpdump.

From here on, every aspect of the Linux networking subsystem can be included in netstat’s output: routes with -r, interfaces with -i, protocols (–protocol=$family for certain choices, like unix, inet, ipx…), -l if you want only listening sockets or -e for extended info.

The defaults columns displayed are active connections, receive queue, send queue, local and foreign addresses, state, user, PID/name, socket type, socket state or path. These are only the most interesting pieces of information netstat displays, but not the only ones. As usual, refer to the manual page.

nmap

The last utility we’ll talk about in the network section is nmap. Its name comes from Network Mapper and it’s useful as a network/port scanner, invaluable for network audits. It can be used on remote hosts as well as on local ones. If you want to see which hosts are alive on a class C network, you will simply type:

$ nmap 192.168.0/24

nmap will return some output like this:

Nmap scan report for 192.168.0.1
Host is up (0.0065s latency).
Not shown: 998 closed ports
PORT   STATE SERVICE
23/tcp open  telnet
80/tcp open  http

Nmap scan report for 192.168.0.102
Host is up (0.00046s latency).
Not shown: 999 closed ports
PORT   STATE SERVICE
22/tcp open  ssh

Nmap scan report for 192.168.0.103
Host is up (0.00049s latency).
Not shown: 999 closed ports
PORT   STATE SERVICE
22/tcp open  ssh

What we can learn from this short example: nmap supports CIDR notations for scanning entire (sub)networks, it’s fast and by default it displays the IP address and any open ports of every host. If we would have wanted to scan just a portion of the network, say IPs from 20 to 30, we would have written:

$ nmap 192.168.0.20-30

This is the simplest possible use of nmap. It can scan hosts for operating system version, script and traceroute (with -A) or use different scanning techniques, like UDP, TCP SYN or ACK. It also can try to pass firewalls or IDS, do MAC spoofing and all kinds of neat tricks.




There are lots of things this tool can do, and all of them are documented in the manual page. Please remember that some (most) administrators don’t like it very much when someone is scanning their network, so don’t get yourself in trouble. The nmap developers have put up a host, scanme.nmap.org, with the sole purpose of testing various options. Let’s try to find what OS it’s running in a verbose manner (for advanced options you’ll need root):

$ nmap -A -v scanme.nmap.org

Example output:

NSE: Script Scanning completed.
Nmap scan report for scanme.nmap.org (74.207.244.221)
Host is up (0.21s latency).
Not shown: 995 closed ports
PORT    STATE    SERVICE      VERSION
22/tcp  open     ssh          OpenSSH 5.3p1 Debian 3ubuntu7 (protocol 2.0)
| ssh-hostkey: 1024 8d:60:f1:7c:ca:b7:3d:0a:d6:67:54:9d:69:d9:b9:dd (DSA)
|_2048 79:f8:09:ac:d4:e2:32:42:10:49:d3:bd:20:82:85:ec (RSA)
80/tcp  open     http         Apache httpd 2.2.14 ((Ubuntu))
|_html-title: Go ahead and ScanMe!
135/tcp filtered msrpc
139/tcp filtered netbios-ssn
445/tcp filtered microsoft-ds
OS fingerprint not ideal because: Host distance (14 network hops) is greater than five
No OS matches for host
Uptime guess: 19.574 days (since Fri Sep 30 08:34:53 2011)
Network Distance: 14 hops
TCP Sequence Prediction: Difficulty=205 (Good luck!)
IP ID Sequence Generation: All zeros
Service Info: OS: Linux
[traceroute output supressed]

We recommend you also take a look at netcat, snort or aircrack-ng. Like we said, our list is by no means exhaustive.

For more help, see our other guide on How to monitor network activity on a Linux system.

System monitoring

Let’s say you see your system is starting to have intense HDD activity and you’re only playing Nethack on it. You’ll probably want to see what’s happening. Or maybe you installed a new web server and you want to see how well it fares. This part is for you.

Just like in the networking section, there are lots of tools, graphical or CLI, that will help you keep in touch with the state of the machines you’re administering. We will not talk about the graphical tools, like gnome-system-monitor, because X installed on a server, where these tools are often used, doesn’t really make sense.

The first system monitoring utility is a personal favorite and a small utility used by sysadmins around the world. It’s called top and should be installed by default. But, if not, execute this command:

$ sudo apt install procps

You can execute the command with no options:

$ top
top command on Linux
top command on Linux

top is a process viewer (there is also htop, a more eye-pleasing variant) and, as you can see, it gives you every information you need when you want to see what’s running on your system: process, PID, user, state, time, CPU usage and so on.

I usually start top with -d 1, which means that it should run and refresh every second (running top without options sets the delay value to three). Once top is started, pressing certain keys will help you order the data in various ways: pressing 1 will show the usage of all CPUs, provided you use a SMP machine and kernel, P orders listed processes after CPU usage, M after memory usage and so on. If you want to run top a specific number of times, use -n $number. The manpage will give you access to all the options, of course.

To install htop instead, use this command:

$ sudo apt install htop

To exit the screen from either application, use the Q key on your keyboard.

While top helps you monitor the memory usage of the system, there are other applications specifically written for this purpose. Two of those are free and vmstat (virtual memory status). We usually use free only with the -m flag (megabytes), and its output looks like this:

               total       used       free     shared    buffers     cached
Mem:          2012       1913         98          0          9        679
-/+ buffers/cache:       1224        787
Swap:         2440        256       2184

vmstat output is more complete, as it will also show you I/O and CPU statistics, among others. Both free and vmstat are also part of the procps package, at least on Debian and Ubuntu systems. But when it comes to process monitoring, the most used tool is ps, part of the procps package as well.

It can be completed with pstree, part of psmisc, which shows all the processes in a tree-like structure. Some of ps’ most used flags include -a (all processes with tty), -x (complementary to -a, see the manual page for BSD-styles), -u (user-oriented format) and -f (forest-like output). These are format modifiers only, not options in the classical sense. Here the use of the man page is mandatory, because ps is a tool you will use often.

$ ps aux

Other system monitoring tools include uptime (the name is kind of self explanatory), who (for a listing of the logged-in users), lsof (list open files) or sar, part of the sysstat package, for listing activity counters.

For more help, see some of these other guides we’ve written:



Conclusion

As said before, the list of utilities presented here is by no means exhaustive. Our intention was to put together an article that explains major monitoring tools for everyday use. This will not replace reading and working with real-life systems for a complete understanding of the matter.



Comments and Discussions
Linux Forum