Fundamentals of processes management on Linux

Objective

Learn the fundamentals of processes management on Linux

Operating System and Software Versions

Operating System: – All Linux distributions

Requirements

Some programs mentioned in this tutorial require root access

Difficulty

EASY

Conventions

# – requires given linux commands to be executed with root privileges either
directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

Introduction

One of the core activities of a system administrator is that of monitoring and interacting with the processes running on a machine. In this tutorial you will be introduced to the use of some fundamental tools that will help you accomplish that vital task.

The ps command

Ps is one of the fundamental programs used in process monitoring: it basically gives you a snapshot of the processes running on a machine at the moment you invoke the command. Let’s see it in action: first we will try to run it without any options:

$ ps

  PID TTY          TIME CMD
24424 pts/0    00:00:00 bash
24468 pts/0    00:00:00 ps

As you can see from the output above, only two processes are shown: bash with a PID (process id) 24424 and ps itself with the pid 24468. This is because when invoked without any option, the ps command shows processes associated with the UID of the user who launched the command, and the terminal from which it is invoked.

How to overcome this limitation? Using the -a option we can make ps to show us all processes, with the exception of the session leaders and the processes not associated with a terminal.

A session leader is a process which has a PID that is the same of the SID (Session Id) of the session of which it is (the first) member. When a process is created it is made part of the same session of its parent process: since by convention the session id is the same of the PID of its first member, we call this process a session leader. Let’s try to run ps with the -a option and check its output:

$ ps -a

  PID TTY          TIME CMD
12466 tty1     00:00:00 gnome-session-b
12480 tty1     00:00:17 gnome-shell
12879 tty1     00:00:00 Xwayland
12954 tty1     00:00:00 gsd-sound
12955 tty1     00:00:00 gsd-wacom
12957 tty1     00:00:00 gsd-xsettings
12961 tty1     00:00:00 gsd-a11y-keyboa
12962 tty1     00:00:00 gsd-a11y-settin
12965 tty1     00:00:00 gsd-clipboard
12966 tty1     00:00:03 gsd-color
12967 tty1     00:00:00 gsd-datetime
12970 tty1     00:00:00 gsd-housekeepin
12971 tty1     00:00:00 gsd-keyboard
12972 tty1     00:00:00 gsd-media-keys
12973 tty1     00:00:00 gsd-mouse
12976 tty1     00:00:00 gsd-orientation

[...]

The output of the program has been truncated, but you can easily see that it now includes processes which belong to different terminals and users. The output shows us information about PID in the first column, TTY in the second, TIME which is the cumulative time the CPU spent on the process, and CMD which is the command that started the process.

To have an even richer output we can add the -u and -x options: the former tells ps to do a selection by user id, while the latter instructs the program to include also processes not associated with a terminal, such as daemons:

$ ps -aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.2 223932  8708 ?        Ss   Jul20   0:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 25
root         2  0.0  0.0      0     0 ?        S    Jul20   0:00 [kthreadd]
root         4  0.0  0.0      0     0 ?        S<   Jul20   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S<   Jul20   0:00 [mm_percpu_wq]
root         7  0.0  0.0      0     0 ?        S    Jul20   0:00 [ksoftirqd/0]
root         8  0.0  0.0      0     0 ?        S    Jul20   0:07 [rcu_sched]
root         9  0.0  0.0      0     0 ?        S    Jul20   0:00 [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    Jul20   0:04 [rcuos/0]
root        11  0.0  0.0      0     0 ?        S    Jul20   0:00 [rcuob/0]
root        12  0.0  0.0      0     0 ?        S    Jul20   0:00 [migration/0]
root        13  0.0  0.0      0     0 ?        S    Jul20   0:00 [watchdog/0]
root        14  0.0  0.0      0     0 ?        S    Jul20   0:00 [cpuhp/0]
root        15  0.0  0.0      0     0 ?        S    Jul20   0:00 [cpuhp/1]
root        16  0.0  0.0      0     0 ?        S    Jul20   0:00 [watchdog/1]
root        17  0.0  0.0      0     0 ?        S    Jul20   0:00 [migration/1]
root        18  0.0  0.0      0     0 ?        S    Jul20   0:00 [ksoftirqd/1]
root        20  0.0  0.0      0     0 ?        S<   Jul20   0:00 [kworker/1:0H]
root        21  0.0  0.0      0     0 ?        S    Jul20   0:02 [rcuos/1]
root        22  0.0  0.0      0     0 ?        S    Jul20   0:00 [rcuob/1]
root        23  0.0  0.0      0     0 ?        S    Jul20   0:00 [cpuhp/2]
root        24  0.0  0.0      0     0 ?        S    Jul20   0:00 [watchdog/2]
root        25  0.0  0.0      0     0 ?        S    Jul20   0:00 [migration/2]
root        26  0.0  0.0      0     0 ?        S    Jul20   0:00 [ksoftirqd/2]
root        28  0.0  0.0      0     0 ?        S<   Jul20   0:00 [kworker/2:0H]

[...]
egdoc    13128  0.0  0.1  74736  5388 ?        Ss   Jul20   0:00 /usr/lib/systemd/systemd --user
egdoc    13133  0.0  0.0 106184   420 ?        S    Jul20   0:00 (sd-pam)
egdoc    13143  0.0  0.1 218328  3612 ?        Sl   Jul20   0:00 /usr/bin/gnome-keyring-daemon --daemonize --login

[...]

You can see that quite a lot of new information has been added. The first new column of the output is %CPU: this shows the cpu utilization of the process, expressed as a percentage. A percentage is also used for the next column, %MEM, which shows the physical memory on the machine used by the process. VSZ is the virtual memory size of the process expressed in KiB.

The STAT column uses a code to express the process state. We are not going to describe all possible states here, but just explain the ones appearing in the output above (you can have a complete overview by consulting the ps manpage).

Let’s examine the first process in the output: it is has PID 1, therefore is the first process launched by the kernel. This makes sense, we can see that it is systemd, the relatively new Linux init system, now adopted by almost all distributions. First of all we have an S which indicates that the process is in the state of interruptible sleep which means that it is idle, and will wake up as soon as it receives an input. The s, instead, tells us that the process is a session leader.

Another symbol, not appearing in the first raw, but in some of the other processes descriptions is < which indicates that the process has high priority, and therefore a low nice value (we will see what a nice value is in the relevant section of this tutorial). An l in the STAT column, indicates that the process is multi-threaded, and a + sign, that it is in the foreground process group.

Finally, in the last column, we have the START column, showing the time the command started.

Another nice option we can pass to the ps command, is -o, which is the short version of --format. This option let’s you modify the output by the use of placeholders, specifying what columns to show. For example, running:

$ ps -ax -o %U%p%n%c

Will give us the USER column first (%U), followed by the PID of the process (%p), by the NI column (%n), which indicates the nice level, and finally by the COMMAND column (%c):

USER       PID  NI COMMAND
root         1   0 systemd
root         2   0 kthreadd
root         4 -20 kworker/0:0H
root         6 -20 mm_percpu_wq
root         7   0 ksoftirqd/0
root         8   0 rcu_sched
root         9   0 rcu_bh
root        10   0 rcuos/0
root        11   0 rcuob/0
root        12   - migration/0
root        13   - watchdog/0
root        14   0 cpuhp/0
root        15   0 cpuhp/1
root        16   - watchdog/1
root        17   - migration/1
root        18   0 ksoftirqd/1
root        20 -20 kworker/1:0H
root        21   0 rcuos/1
root        22   0 rcuob/1
root        23   0 cpuhp/2
root        24   - watchdog/2
root        25   - migration/2
root        26   0 ksoftirqd/2

Using ‘top’ to dynamically interact with processes

While ps gives us a static snapshot of processes and their information at the time you run it, top gives us a dynamic view of the processes, updated at a specified time interval that we can specify both when launching the program and interactively (default is 3 seconds).

Top doesn’t just show us a dynamic representation of the running processes: we can interact with them and with the program itself, by the use of some keys. For example, pressing B lets us toggle the use of bold characters, d lets us enter a value to change the delay time, k lets us send a signal to a process by prompting for its PID and for the signal code, with SIGTERM being the default.

Change priority of processes with nice and renice

As we have seen before, each process have a priority assigned to it, which indicates how much the process have to wait for other processes to free resources before it can access them. This priority can be specified with a value which is in a range that goes from -20 to 19. The less the value, the highest the priority of the process. This can seem counter-intuitive at first, but see it this way: the nicer the process is to other processes, the more they will surpass it in accessing the resources.

But how can we set the priority of a process? We can use the nice program to accomplish the task. Say you want to run a script with the lowest possible priority value: you would preface it this way:

$ nice -n 19 ./script.sh

You can also change the priority of a program that is already running by the use of renice knowing its PID:

# renice -n 15 PID

Where PID is the process id of the program. Just remember that the renice command must be run with root permissions.

Send signals to processes with the kill and killall commands

We can use the kill command to send a signal to a process which belong to us, or to every process if we have root permissions. The various signals we can send are identified by a number: we can easily see these correspondences by running the kill command with the -l option:

$ kill -l
1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX

If no option is passed to the kill command, by default it will send a SIGTERM signal to the specified process, to which the latter could react in various ways: it may stop immediately, try to do some cleanup before stopping, or just ignore the signal.

To specify the signal to be sent using kill, we run the command followed by a dash and the number of the signal to be sent. For example to run a SIGKILL signal we should run:

kill -9 PID

The SIGKILL signal, unlike SIGTERM cannot be caught by the process, which cannot react: it will just be terminated immediately.

Another signal you will often see is SIGINT which is the signal that is sent on keyboard interrupt (CTRL-c). It also tries to terminate the process in a graceful way, and can be ignored by the process. SIGSTOP and SIGCONT will respectively suspend and resume the execution of a process: the former, like SIGKILL cannot be caught or ignored. For a complete list and description of signals you can consult the manual for signal(7) running:

man 7 signal

The killall program has the same purpose of kill, and like kill, sends a SIGTERM signal when no other is specified, (this time with the --signal option), but instead of referencing a process by its PID, it will do it by command name, effectively killing all processes running under the same one.