How to propagate a signal to child processes from a Bash script

Suppose we write a script which spawns one or more long running processes; if said script receives a signal such as SIGINT or SIGTERM, we probably want its children to be terminated too (normally when the parent dies, the children survives). We may also want to perform some cleanup tasks before the script itself exits. To be able to reach our goal, we must first learn about process groups and how to execute a process in background.

In this tutorial you will learn:

  • What is a process group
  • The difference between foreground and background processes
  • How to execute a program in background
  • How to use the shell wait built in to wait for a process executed in background
  • How to terminate child processes when the parent receives a signal

How to propagate a signal to child processes from a Bash script

How to propagate a signal to child processes from a Bash script

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution independent
Software No specific software needed
Other None
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

A simple example

Let’s create a very simple script and simulate the launch of a long running process:

#!/bin/bash

trap "echo signal received!" SIGINT

echo "The script pid is $"
sleep 30


The first thing we did in the script was to create a trap to catch SIGINT and print a message when the signal is received. We than made our script print its pid: we can get by expanding the $$ variable. Next, we executed the sleep command to simulate a long running process (30 seconds).

We save the code inside a file (say it is called test.sh), make it executable, and launch it from a terminal emulator. We obtain the following result:

The script pid is 101248

If we are focused on the terminal emulator and press CTRL+C while the script is running, a SIGINT signal is sent and handled by our trap:

The script pid is 101248
^Csignal received!

Although the trap handled the signal as expected, the script was interrupted anyway. Why this happened? Furthermore, if we send a SIGINT signal to the script using the kill command, the result we obtain is quite different: the trap is not immediately executed, and the script goes on until the child process doesn’t exit (after 30 seconds of “sleeping”). Why this difference? Let’s see…

Process groups, foreground and background jobs

Before we answer the questions above, we must better grasp the concept of process group.

A process group is a group of processes which share the same pgid (process group id). When a member of a process group creates a child process, that process becomes a member of the same process group. Each process group have a leader; we can easily recognize it because its pid and the pgid are the same.

We can visualize pid and pgid of running processes using the ps command. The output of the command can be customized so that only the fields we are interested in are displayed: in this case CMD, PID and PGID. We do this by using the -o option, providing a comma-separated list of fields as argument:

$ ps -a -o pid,pgid,cmd

If we run the command while our script is running the relevant part of the output we obtain is the following:

   PID    PGID CMD
298349  298349 /bin/bash ./test.sh
298350  298349 sleep 30

We can clearly see two processes: the pid of the first one is 298349, same as its pgid: this is the process group leader. It was created when we launched the script as you can see in the CMD column.

This main process launched a child process with the command sleep 30: as expected the two processes are in the same process group.

When we pressed CTRL-C while focusing on the terminal from which the script was launched, the signal was not sent only to the parent process, but to the entire process group. Which process group? The foreground process group of the terminal. All processes member of this group are called foreground processes, all the others are called background processes. Here is what the Bash manual has to say on the matter:

DID YOU KNOW?
To facilitate the implementation of the user interface to job control, the operating system maintains the notion of a current terminal process group ID. Members of this process group (processes whose process group ID is equal to the current terminal process group ID) receive keyboard- generated signals such as SIGINT. These processes are said to be in the foreground. Background processes are those whose process group ID differs from the terminal’s; such processes are immune to keyboard-generated signals.

When we sent the SIGINT signal with the kill command, instead, we targeted only the pid of the parent process; Bash exhibits a specific behavior when a signal is received while it’s waiting for a program to complete: the “trap code” for that signal is not executed until that process has finished. This is why the “signal received” message was displayed only after the sleep command exited.

To replicate what happens when we press CTRL-C in the terminal using the kill command to send the signal, we must target the process group. We can send a signal to a process group by using the negation of the pid of the process leader, so, supposing the pid of the process leader is 298349 (as in the previous example), we would run:

$ kill -2 -298349

Manage signal propagation from inside a script

Now, suppose we launch a long running script from a non interactive shell, and we want said script to manage signal propagation automatically, so that when it receives a signal such as SIGINT or SIGTERM it terminates its potentially long running child, eventually performing some cleanup tasks before exiting. How we can do this?

Like we did previously, we can handle the situation in which a signal is received in a trap; however, as we saw, if a signal is received while the shell its waiting for a program to complete, the “trap code” is executed only after the child process exits.

This is not what we want: we want the trap code to be processed as soon as the parent process receives the signal. To achieve our goal, we must execute the child process in the background: we can do this by placing the & symbol after the command. In our case we would write:

#!/bin/bash

trap 'echo signal received!' SIGINT

echo "The script pid is $"
sleep 30 &

If we would leave the script this way, the parent process would exit right after the execution of the sleep 30 command, leaving us without the chance to perform clean up tasks after it ends or is interrupted. We can solve this problem by using the shell wait built in. The help page of wait defines it this way:



Waits for each process identified by an ID, which may be a process ID or a job specification, and reports its termination status. If ID is not given, waits for all currently active child processes, and the return status is zero.

After we set a process to be executed in the background, we can retrieve its pid in the $! variable. We can pass it as an argument to wait to make the parent process wait for its child:

#!/bin/bash

trap 'echo signal received!' SIGINT

echo "The script pid is $"
sleep 30 &

wait $!

Are we done? No, there is still a problem: the reception of a signal handled in a trap inside the script, causes the wait builtin to return immediately, without actually waiting for the termination of the command in background. This behavior is documented in the Bash manual:

When bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed. This is good, because the signal is handled right away and the trap is executed without having to wait for the child to terminate, but brings up a problem, since in our trap we want to execute our cleanup tasks only once we are sure the child process exited.

To solve this problem we have to use wait again, perhaps as part of the trap itself. Here is what our script could look like in the end:

#!/bin/bash

cleanup() {
  echo "cleaning up..."
  # Our cleanup code goes here
}

trap 'echo signal received!; kill "${child_pid}"; wait "${child_pid}"; cleanup' SIGINT SIGTERM

echo "The script pid is $"
sleep 30 &

child_pid="$!"
wait "${child_pid}"

In the script we created a cleanup function where we could insert our cleanup code, and made our trap catch also the SIGTERM signal. Here is what happens when we run this script and send one of those two signals to it:

  1. The script is launched and the sleep 30 command is executed in the background;
  2. The pid of the child process is “stored” in the child_pid variable;
  3. The script waits the termination of the child process;
  4. The script receives a SIGINT or SIGTERM signal
  5. The wait command returns immediately, without waiting for the child termination;

At this point the trap is executed. In it:

  1. A SIGTERM signal (the kill default) is sent to the child_pid;
  2. We wait to make sure the child is terminated after receiving this signal.
  3. After wait returns, we execute the cleanup function.

Propagate the signal to multiple children

In the example above we worked with a script which had only one child process. What if a script has many children, and what if some of them have children of their own?

In the first case, one quick way to get the pids of all the children is to use the jobs -p command: this command displays the pids of all the active jobs in the current shell. We can than use kill to terminate them. Here is an example:

#!/bin/bash

cleanup() {
  echo "cleaning up..."
  # Our cleanup code goes here
}

trap 'echo signal received!; kill $(jobs -p); wait; cleanup' SIGINT SIGTERM

echo "The script pid is $"

sleep 30 &
sleep 40 &

wait

The script launches two processes in the background: by using the wait built in without arguments, we wait for all of them, and keep the parent process alive. When the SIGINT or SIGTERM signals are received by the script, we send a SIGTERM to both of them, having their pids returned by the jobs -p command (job is itself a shell built-in, so when we use it, a new process is not created).

If the children have children process of their own, and we want to terminate them all when the ancestor receives a signal, we can send a signal to the entire process group, as we saw before.

This, however, presents a problem, since by sending a termination signal to the process group, we would enter a “signal-sent/signal-trapped” loop. Think about it: in the trap for SIGTERM we send a SIGTERM signal to all members of the process group; this includes the parent script itself!

To solve this problem and still be able to execute a cleanup function after child processes are terminated, we must change the trap for SIGTERM just before we send the signal to the process group, for example:

#!/bin/bash

cleanup() {
  echo "cleaning up..."
  # Our cleanup code goes here
}

trap 'trap " " SIGTERM; kill 0; wait; cleanup' SIGINT SIGTERM

echo "The script pid is $"

sleep 30 &
sleep 40 &

wait


In the trap, before sending SIGTERM to the process group, we changed the SIGTERM trap, so that the parent process ignores the signal and only its descendants are affected by it. Notice also that in the trap, to signal the process group, we used kill with 0 as pid. This is a sort of shortcut: when the pid passed to kill is 0, all the processes in the current process group are signaled.

Conclusions

In this tutorial we learned about process groups and what is the difference between foreground and background processes. We learned that CTRL-C sends a SIGINT signal to the entire foreground process group of the controlling terminal, and we learned how to send a signal to a process group using kill. We also learned how to execute a program in the background, and how to use the wait shell built in to wait for it to exit without loosing the parent shell. Finally, we saw how to setup a script so that when it receives a signal it terminates its children before exiting. Did I miss something? Do you have your personal recipes to accomplish the task? Don’t hesitate to let me know!



Comments and Discussions
Linux Forum