Whether you’re a home user or a system/network administrator at a large site, monitoring your Linux system helps you in many ways – possibly more than you currently know.
For example, you have important work-related documents on your laptop and one fine day, the hard drive decides to die on you without even saying goodbye. Since most users don’t make backups, you’ll have to call your boss and tell him the latest financial reports are gone. Not nice. But if you used a regularly started (at boot or with cron) disk monitoring and reporting piece of software, like smartd for example, it will tell you when your drive(s) start to become weary. Between us, though, a hard drive may decide to go belly up without warning, so backup your data.
This tutorial will deal with everything related to system monitoring, whether it’s network, disk or temperature. This subject usually can form enough material for a book, but we will try to give you only the most important information in order to get you started, or, depending on experience, have all the info in one place. You are expected to know your hardware and have basic sysadmin skills, but regardless where you’re coming from, you’ll find something useful here.
In this tutorial you will learn:
- How to monitor system temperature
- How to monitor hard disk and input/output
- How to use network monitoring tools
- How to monitor system CPU, memory, and running processes
|Category||Requirements, Conventions or Software Version Used|
|System||Any Linux system|
|Other||Privileged access to your Linux system as root or via the
# – requires given linux commands to be executed with root privileges either directly as a root user or by use of
$ – requires given linux commands to be executed as a regular non-privileged user
Linux Hardware Monitoring Tools
The following sections will introduce tools for different jobs, such as the monitoring of temperature, hard drive status, network, system processes, etc. We’ll show you the commands to install these tools on Debian and Ubuntu based systems. For others following along, you should use your package manager to search for the proper package name, as sometimes they go by slightly different names.
The best tool for the job, when it comes to monitoring your system’s temperatures, is a package called sensors. Some distros may come with the software already preinstalled. On other systems, you may need to install it. On Debian or a derivative you can simply execute:
$ sudo apt install lm-sensors
On OpenSUSE systems the package is named simply
sensors, while on Fedora you can find it under the name
lm_sensors. You can use the search function of your package manager to find sensors, since most distributions offer it.
Now, as long as you have relatively modern hardware, you will probably have temperature monitoring capability. If you use a desktop distribution, you will have hardware monitoring support enabled.
To get started monitoring your system’s temperatures, use the following command in terminal.
$ sudo sensors-detect
Here is a snippet of output that was produced on our test system:
k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +32.0°C Core0 Temp: +33.0°C Core1 Temp: +29.0°C Core1 Temp: +25.0°C nouveau-pci-0200 Adapter: PCI adapter temp1: +58.0°C (high = +100.0°C, crit = +120.0°C)
Your BIOS might have (most do) a temperature failsafe option: if the temperature reaches a certain threshold, the system will shutdown in order to prevent damage to the hardware. On the other hand, while on a regular desktop the sensors command might not seem very useful, on server machines located maybe hundreds of kilometers away such a tool might make every difference in the world.
If you’re the administrator of such systems, we recommend you write a short script that will mail you hourly, for example, with reports and maybe statistics about system temperature.
Disk and I/O
In this part we will refer to hardware status monitoring first, then go to the I/O section which will deal with detection of bottlenecks, reads/writes and the like. Let’s start with how to get disk health reports from your hard drives.
S.M.A.R.T., which stands for Self Monitoring Analysis and Reporting Technology, is a capability offered by modern hard drives that lets the administrator efficiently monitor disk health. The application to install is usually named
smartmontools, which offers a systemd script for regular writing to syslog.
It can be installed with this command:
$ sudo apt install smartmontools
Its name is
smartd and you can configure it by editing
/etc/smartd.conf and configuring the disks to be monitored and when to be monitored. This suite of S.M.A.R.T. tools works on Linux, the BSDs, Solaris, Darwin and even OS/2.
Distributions offer graphical front ends to smartctl, the main application to use when you want to see how your drives are doing, but we will focus on the command line utility. One uses -a (all info)
/dev/sda as an argument, for example, to get a detailed report on the status of the first drive installed on the system. Here’s what I get:
$ sudo smartctl -a /dev/sda === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Blue Serial ATA Device Model: WDC WD5000AAKS-00WWPA0 Serial Number: WD-WCAYU6160626 LU WWN Device Id: 5 0014ee 158641699 Firmware Version: 01.03B01 User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Oct 19 19:01:08 2011 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED [snip] SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 138 138 021 Pre-fail Always - 4083 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 369 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 4186 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 366 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 347 194 Temperature_Celsius 0x0022 105 098 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
What we can get from this output is, basically, that no errors are reported and that all values are within normal margins. When it comes to temperature, if you have a laptop and you see abnormally high values, consider cleaning the insides of your machine for better air flow.
The platters may get deformed because of excessive heat and you certainly don’t want that. If you use a desktop machine, you can get a hard drive cooler for a cheap price. Anyway, if your BIOS has that capability, when POSTing it will warn you if the drive is about to fail.
smartctl offers a suite of tests one can perform: you can select what test you want to run with the
$ sudo smartctl -t long /dev/sda
Depending on the size of the disk and the test you chose, this operation can take quite some time. Some people recommend running tests when the system does not have any significant disk activity, others even recommend using a live CD. Of course these are common sense advices, but in the end all this depends on the situation. Please refer to the smartctl manual page for more useful command-line flags.
If you are working with computers that do lots of read/write operations, like a busy database server, for instance, you will need to check disk activity. Or you want to test the performance your disk(s) offer you, regardless of the purpose of the computer. For the first task we will use
iostat, for the second one we’ll have a look at
bonnie++. These are just two of the applications one can use, but they’re popular and do their job quite well, so I felt no need to look elsewhere.
If you don’t find iostat on your system, your distribution might have it included in the sysstat package, which offers lots of tools for the Linux administrator, and we’ll talk about them a little later.
Install it with the following command:
$ sudo apt install sysstat
You can run
iostat with no arguments, which will give you something like this:
Linux 5.13.0-27-generic (linuxconfig) 02/02/2022 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 17.10 2.22 8.89 1.67 0.00 70.11 Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd sda 42.29 1120.77 528.22 0.00 1115925 525937 0
If you want iostat to run continuously, just use
-d (delay) and an integer:
$ iostat -d 1 10
This command will run iostat 10 times at a one second interval. Read the manual page for the rest of the options. It will be worth it, you’ll see. After looking at the flags available, one common iostat command may be like
$ iostat -d 1 -x -h
-x stands for eXtended statistics and
-h is from Human readable output.
bonnie++’s name (the incremented part) comes from its inheritance, the classic bonnie benchmarking program. It supports lots of hard disk and filesystem tests that stress the machine by writing/reading lots of files. It can be found on most Linux distributions exactly by that name: bonnie++. Here’s how to install it.
$ sudo apt install bonnie++
You can simply run the command with no extra options:
Here’s the output that we received from bonnie on our test machine.
Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.98 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP linuxconfig 4G 122k 96 69.5m 24 125m 55 446k 89 262m 48 3895 204 Latency 179ms 726ms 42985us 65391us 44430us 14077us Version 1.98 ------Sequential Create------ --------Random Create-------- linuxconfig -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 0 39 +++++ +++ 724249376 35 0 37 +++++ +++ 724249376 38 Latency 44098us 13586us 12148us 16141us 12212us 12207us 1.98,1.98,linuxconfig,1,1643851174,4G,,8192,5,122,96,71185,24,128266,55,446,89,268335,48,3895,204,16,,,,,7366,39,+++++,+++,11979,35,7939,37,+++++,+++,13108,38,179ms,726ms,42985us,65391us,44430us,14077us,44098us,13586us,12148us,16141us,12212us,12207us
Please bear in mind that running bonnie++ will stress your machine, so it’s a good idea to do this when the system isn’t as busy as usual. You can choose the output format (CSV, text, HTML), the destination directory or file size.
Again, read the manual, because these programs depend on the underlying hardware and its usage. Only you know best what you want to get from bonnie++.
Before we start, you should know that we will not deal with network monitoring from a security standpoint, but from a performance and troubleshooting standpoint, although the tools are the same sometimes (wireshark, iptraf, etc.). When you’re getting a file with 10 kbps from the NFS server in the other building, you might think about checking your network for bottlenecks.
This is a large subject, since it depends on a plethora of factors, like hardware, cables, topology and so on. We will approach the matter in a unified way, meaning you will be shown how to install and how to use the tools, instead of classifying them and getting you all confused with unnecessary theory. We won’t include every tool ever written for Linux network monitoring, just what is considered important.
Before we start talking about complex tools, let’s start with the simple ones. Here, the trouble part from troubleshooting refers to network connectivity problems. Other tools, as you will see, refer to attack prevention tools. Again, only the subject of network security spawned many times, so this will be as short as it can be.
These simple tools are
ip and friends. They are usually part of the
net-tools package (may vary depending on the distribution) and are very probably already installed on your system.
dnsutils is a package worth installing, as it contains popular applications like
nslookup. If you don’t already know what these commands do, we recommend you do some reading as they are essential to any Linux user, regardless of the purpose of the computer (s)he uses.
No such chapter in any network troubleshooting/monitoring guide will ever be complete without a part on
tcpdump. It is a pretty complex and useful network monitoring tool, whether you’re on a small LAN or on a big corporate network.
What tcpdump does, basically, is packet monitoring, also known as packet sniffing. You will need root privileges in order to run it, because tcpdump needs the physical interface to run in promiscuous mode, which isn’t the default running mode of a Ethernet card. Promiscuous mode means that the NIC will get all traffic on the network, rather than only the traffic intended for it.
In case it’s not already installed, you can use the following command in terminal.
$ sudo apt install tcpdump
We will not try to replace tcpdump’s well written manual page, we’ll leave that to you. But before we go on, we recommend you learn some basic networking concepts in order to make sense of tcpdump, like TCP/UDP, payload, packet, header and so on.
One cool feature of tcpdump is the ability to practically capture web pages, done through using
-A. Try starting tcpdump like so:
$ sudo tcpdump -vv -A
Now, go to a webpage. Then come back to the terminal window where
tcpdump is executing. You’ll see many interesting things about that website, like what OS the webserver is running or what PHP version was used to create the page.
-i to specify the interface to listen on (like
eth1, and so on) or
-p for not using the NIC in promiscuous mode, useful in some situations. You can save the output to a file with
-w $file if you need to check on it later (remember that the file will contain raw output). So an example of tcpdump usage based on what you read above would be:
$ sudo tcpdump -vv -A -i eth0 -w outputfile
We must remind you that this tool and others, like
wireshark, while they can be useful for monitoring your network for rogue applications and users, it can also be useful to rogue users. Please don’t use such tools for malicious purposes.
If you need a cooler interface to a sniffing/analyzing program, you might try iptraf (CLI) or wireshark (GTK). We will not discuss them in more detail, because the functionality they offer is similar to tcpdump. We recommend tcpdump, though, because it’s almost certain you’ll find it installed regardless of distribution, and it will give you the chance to learn.
netstatis another useful tool for live remote and local connections, which prints its output in a more organized, table-like manner. The name of the package will usually be simply
net-toolsand most distributions offer it.
To install the software:
$ sudo apt install net-tools
If you start netstat without arguments, it will print a list of open sockets and then exit. But since it’s a versatile tool, you can control what to see depending on what you need. First of all,
-c will help you if you need continuous output, similar to tcpdump.
From here on, every aspect of the Linux networking subsystem can be included in netstat’s output: routes with
-r, interfaces with
-i, protocols (–protocol=$family for certain choices, like unix, inet, ipx…),
-l if you want only listening sockets or
-e for extended info.
The defaults columns displayed are active connections, receive queue, send queue, local and foreign addresses, state, user, PID/name, socket type, socket state or path. These are only the most interesting pieces of information netstat displays, but not the only ones. As usual, refer to the manual page.
The last utility we’ll talk about in the network section is
nmap. Its name comes from Network Mapper and it’s useful as a network/port scanner, invaluable for network audits. It can be used on remote hosts as well as on local ones. If you want to see which hosts are alive on a class C network, you will simply type:
$ nmap 192.168.0/24
nmap will return some output like this:
Nmap scan report for 192.168.0.1 Host is up (0.0065s latency). Not shown: 998 closed ports PORT STATE SERVICE 23/tcp open telnet 80/tcp open http Nmap scan report for 192.168.0.102 Host is up (0.00046s latency). Not shown: 999 closed ports PORT STATE SERVICE 22/tcp open ssh Nmap scan report for 192.168.0.103 Host is up (0.00049s latency). Not shown: 999 closed ports PORT STATE SERVICE 22/tcp open ssh
What we can learn from this short example: nmap supports CIDR notations for scanning entire (sub)networks, it’s fast and by default it displays the IP address and any open ports of every host. If we would have wanted to scan just a portion of the network, say IPs from 20 to 30, we would have written:
$ nmap 192.168.0.20-30
This is the simplest possible use of nmap. It can scan hosts for operating system version, script and traceroute (with
-A) or use different scanning techniques, like UDP, TCP SYN or ACK. It also can try to pass firewalls or IDS, do MAC spoofing and all kinds of neat tricks.
There are lots of things this tool can do, and all of them are documented in the manual page. Please remember that some (most) administrators don’t like it very much when someone is scanning their network, so don’t get yourself in trouble. The nmap developers have put up a host,
scanme.nmap.org, with the sole purpose of testing various options. Let’s try to find what OS it’s running in a verbose manner (for advanced options you’ll need root):
$ nmap -A -v scanme.nmap.org
NSE: Script Scanning completed. Nmap scan report for scanme.nmap.org (18.104.22.168) Host is up (0.21s latency). Not shown: 995 closed ports PORT STATE SERVICE VERSION 22/tcp open ssh OpenSSH 5.3p1 Debian 3ubuntu7 (protocol 2.0) | ssh-hostkey: 1024 8d:60:f1:7c:ca:b7:3d:0a:d6:67:54:9d:69:d9:b9:dd (DSA) |_2048 79:f8:09:ac:d4:e2:32:42:10:49:d3:bd:20:82:85:ec (RSA) 80/tcp open http Apache httpd 2.2.14 ((Ubuntu)) |_html-title: Go ahead and ScanMe! 135/tcp filtered msrpc 139/tcp filtered netbios-ssn 445/tcp filtered microsoft-ds OS fingerprint not ideal because: Host distance (14 network hops) is greater than five No OS matches for host Uptime guess: 19.574 days (since Fri Sep 30 08:34:53 2011) Network Distance: 14 hops TCP Sequence Prediction: Difficulty=205 (Good luck!) IP ID Sequence Generation: All zeros Service Info: OS: Linux [traceroute output supressed]
We recommend you also take a look at
aircrack-ng. Like we said, our list is by no means exhaustive.
For more help, see our other guide on How to monitor network activity on a Linux system.
Let’s say you see your system is starting to have intense HDD activity and you’re only playing Nethack on it. You’ll probably want to see what’s happening. Or maybe you installed a new web server and you want to see how well it fares. This part is for you.
Just like in the networking section, there are lots of tools, graphical or CLI, that will help you keep in touch with the state of the machines you’re administering. We will not talk about the graphical tools, like gnome-system-monitor, because X installed on a server, where these tools are often used, doesn’t really make sense.
The first system monitoring utility is a personal favorite and a small utility used by sysadmins around the world. It’s called
top and should be installed by default. But, if not, execute this command:
$ sudo apt install procps
You can execute the command with no options:
top is a process viewer (there is also
htop, a more eye-pleasing variant) and, as you can see, it gives you every information you need when you want to see what’s running on your system: process, PID, user, state, time, CPU usage and so on.
I usually start top with
-d 1, which means that it should run and refresh every second (running top without options sets the delay value to three). Once top is started, pressing certain keys will help you order the data in various ways: pressing 1 will show the usage of all CPUs, provided you use a SMP machine and kernel, P orders listed processes after CPU usage, M after memory usage and so on. If you want to run top a specific number of times, use
-n $number. The manpage will give you access to all the options, of course.
To install htop instead, use this command:
$ sudo apt install htop
To exit the screen from either application, use the Q key on your keyboard.
While top helps you monitor the memory usage of the system, there are other applications specifically written for this purpose. Two of those are
vmstat (virtual memory status). We usually use
free only with the
-m flag (megabytes), and its output looks like this:
total used free shared buffers cached Mem: 2012 1913 98 0 9 679 -/+ buffers/cache: 1224 787 Swap: 2440 256 2184
vmstat output is more complete, as it will also show you I/O and CPU statistics, among others. Both
vmstat are also part of the
procps package, at least on Debian and Ubuntu systems. But when it comes to process monitoring, the most used tool is
ps, part of the procps package as well.
It can be completed with
pstree, part of
psmisc, which shows all the processes in a tree-like structure. Some of ps’ most used flags include
-a (all processes with tty),
-x (complementary to
-a, see the manual page for BSD-styles),
-u (user-oriented format) and
-f (forest-like output). These are format modifiers only, not options in the classical sense. Here the use of the man page is mandatory, because ps is a tool you will use often.
$ ps aux
Other system monitoring tools include
uptime (the name is kind of self explanatory),
who (for a listing of the logged-in users),
lsof (list open files) or
sar, part of the
sysstat package, for listing activity counters.
For more help, see some of these other guides we’ve written:
- How to Monitor RAM Usage on Linux
- How to Check and Monitor CPU utilization on Linux
- How to use ps command in Linux: Beginners guide
As said before, the list of utilities presented here is by no means exhaustive. Our intention was to put together an article that explains major monitoring tools for everyday use. This will not replace reading and working with real-life systems for a complete understanding of the matter.