Suppose we write a script which spawns one or more long running processes; if said script receives a signal such as SIGINT
or SIGTERM
, we probably want its children to be terminated too (normally when the parent dies, the children survives). We may also want to perform some cleanup tasks before the script itself exits. To be able to reach our goal, we must first learn about process groups and how to execute a process in background.
In this tutorial you will learn:
- What is a process group
- The difference between foreground and background processes
- How to execute a program in background
- How to use the shell
wait
built in to wait for a process executed in background - How to terminate child processes when the parent receives a signal
Software requirements and conventions used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Distribution independent |
Software | No specific software needed |
Other | None |
Conventions | # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
A simple example
Let’s create a very simple script and simulate the launch of a long running process:
#!/bin/bash
trap "echo signal received!" SIGINT
echo "The script pid is $"
sleep 30
The first thing we did in the script was to create a trap to catch SIGINT
and print a message when the signal is received. We than made our script print its pid: we can get by expanding the $$
variable. Next, we executed the sleep
command to simulate a long running process (30
seconds).
We save the code inside a file (say it is called test.sh
), make it executable, and launch it from a terminal emulator. We obtain the following result:
The script pid is 101248
If we are focused on the terminal emulator and press CTRL+C while the script is running, a SIGINT
signal is sent and handled by our trap:
The script pid is 101248 ^Csignal received!
Although the trap handled the signal as expected, the script was interrupted anyway. Why this happened? Furthermore, if we send a SIGINT
signal to the script using the kill
command, the result we obtain is quite different: the trap is not immediately executed, and the script goes on until the child process doesn’t exit (after 30
seconds of “sleeping”). Why this difference? Let’s see…
Process groups, foreground and background jobs
Before we answer the questions above, we must better grasp the concept of process group.
A process group is a group of processes which share the same pgid (process group id). When a member of a process group creates a child process, that process becomes a member of the same process group. Each process group have a leader; we can easily recognize it because its pid and the pgid are the same.
We can visualize pid and pgid of running processes using the ps
command. The output of the command can be customized so that only the fields we are interested in are displayed: in this case CMD, PID and PGID. We do this by using the -o
option, providing a comma-separated list of fields as argument:
$ ps -a -o pid,pgid,cmd
If we run the command while our script is running the relevant part of the output we obtain is the following:
PID PGID CMD 298349 298349 /bin/bash ./test.sh 298350 298349 sleep 30
We can clearly see two processes: the pid of the first one is 298349
, same as its pgid: this is the process group leader. It was created when we launched the script as you can see in the CMD column.
This main process launched a child process with the command sleep 30
: as expected the two processes are in the same process group.
When we pressed CTRL-C while focusing on the terminal from which the script was launched, the signal was not sent only to the parent process, but to the entire process group. Which process group? The foreground process group of the terminal. All processes member of this group are called foreground processes, all the others are called background processes. Here is what the Bash manual has to say on the matter:
When we sent the SIGINT
signal with the kill
command, instead, we targeted only the pid of the parent process; Bash exhibits a specific behavior when a signal is received while it’s waiting for a program to complete: the “trap code” for that signal is not executed until that process has finished. This is why the “signal received” message was displayed only after the sleep
command exited.
To replicate what happens when we press CTRL-C in the terminal using the kill
command to send the signal, we must target the process group. We can send a signal to a process group by using the negation of the pid of the process leader, so, supposing the pid of the process leader is 298349
(as in the previous example), we would run:
$ kill -2 -298349
Manage signal propagation from inside a script
Now, suppose we launch a long running script from a non interactive shell, and we want said script to manage signal propagation automatically, so that when it receives a signal such as SIGINT
or SIGTERM
it terminates its potentially long running child, eventually performing some cleanup tasks before exiting. How we can do this?
Like we did previously, we can handle the situation in which a signal is received in a trap; however, as we saw, if a signal is received while the shell its waiting for a program to complete, the “trap code” is executed only after the child process exits.
This is not what we want: we want the trap code to be processed as soon as the parent process receives the signal. To achieve our goal, we must execute the child process in the background: we can do this by placing the &
symbol after the command. In our case we would write:
#!/bin/bash
trap 'echo signal received!' SIGINT
echo "The script pid is $"
sleep 30 &
If we would leave the script this way, the parent process would exit right after the execution of the sleep 30
command, leaving us without the chance to perform clean up tasks after it ends or is interrupted. We can solve this problem by using the shell wait
built in. The help page of wait
defines it this way:
After we set a process to be executed in the background, we can retrieve its pid in the $!
variable. We can pass it as an argument to wait
to make the parent process wait for its child:
#!/bin/bash
trap 'echo signal received!' SIGINT
echo "The script pid is $"
sleep 30 &
wait $!
Are we done? No, there is still a problem: the reception of a signal handled in a trap inside the script, causes the wait
builtin to return immediately, without actually waiting for the termination of the command in background. This behavior is documented in the Bash manual:
To solve this problem we have to use wait
again, perhaps as part of the trap itself. Here is what our script could look like in the end:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'echo signal received!; kill "${child_pid}"; wait "${child_pid}"; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
child_pid="$!"
wait "${child_pid}"
In the script we created a cleanup
function where we could insert our cleanup code, and made our trap
catch also the SIGTERM
signal. Here is what happens when we run this script and send one of those two signals to it:
- The script is launched and the
sleep 30
command is executed in the background; - The pid of the child process is “stored” in the
child_pid
variable; - The script waits the termination of the child process;
- The script receives a
SIGINT
orSIGTERM
signal - The
wait
command returns immediately, without waiting for the child termination;
At this point the trap is executed. In it:
- A
SIGTERM
signal (thekill
default) is sent to thechild_pid
; - We
wait
to make sure the child is terminated after receiving this signal. - After
wait
returns, we execute thecleanup
function.
Propagate the signal to multiple children
In the example above we worked with a script which had only one child process. What if a script has many children, and what if some of them have children of their own?
In the first case, one quick way to get the pids of all the children is to use the jobs -p
command: this command displays the pids of all the active jobs in the current shell. We can than use kill
to terminate them. Here is an example:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'echo signal received!; kill $(jobs -p); wait; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
sleep 40 &
wait
The script launches two processes in the background: by using the wait
built in without arguments, we wait for all of them, and keep the parent process alive. When the SIGINT
or SIGTERM
signals are received by the script, we send a SIGTERM
to both of them, having their pids returned by the jobs -p
command (job
is itself a shell built-in, so when we use it, a new process is not created).
If the children have children process of their own, and we want to terminate them all when the ancestor receives a signal, we can send a signal to the entire process group, as we saw before.
This, however, presents a problem, since by sending a termination signal to the process group, we would enter a “signal-sent/signal-trapped” loop. Think about it: in the trap
for SIGTERM
we send a SIGTERM
signal to all members of the process group; this includes the parent script itself!
To solve this problem and still be able to execute a cleanup function after child processes are terminated, we must change the trap
for SIGTERM
just before we send the signal to the process group, for example:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'trap " " SIGTERM; kill 0; wait; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
sleep 40 &
wait
In the trap, before sending SIGTERM
to the process group, we changed the SIGTERM
trap, so that the parent process ignores the signal and only its descendants are affected by it. Notice also that in the trap, to signal the process group, we used kill
with 0
as pid. This is a sort of shortcut: when the pid passed to kill
is 0
, all the processes in the current process group are signaled.
Conclusions
In this tutorial we learned about process groups and what is the difference between foreground and background processes. We learned that CTRL-C sends a SIGINT
signal to the entire foreground process group of the controlling terminal, and we learned how to send a signal to a process group using kill
. We also learned how to execute a program in the background, and how to use the wait
shell built in to wait for it to exit without loosing the parent shell. Finally, we saw how to setup a script so that when it receives a signal it terminates its children before exiting. Did I miss something? Do you have your personal recipes to accomplish the task? Don’t hesitate to let me know!