Multi-threaded Bash scripting & process management at the command line

The things you can do using Bash script are limitless. Once you begin to developed advanced scripts, you’ll soon find you will start to run into operating system limits. For example, does your computer have 2 CPU threads or more (many modern machines have 8-32 threads)? If so, then you will likely benefit from multi-threaded Bash scripting and coding. Continue reading and find out why!

In this tutorial you will learn:

  • How to implement multi-threaded Bash one-liners directly from the command line
  • Why multi-threaded coding almost always can and will increase the performance of your scripts
  • How background and foreground processes work and how to manipulate job queues

Multi-threaded Bash scripting & process management

Multi-threaded Bash scripting & process management

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution-independent, Bash version-dependent
Software Bash command line interface (bash)
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

When you execute a Bash script, it will at maximum use a single CPU thread, unless you start subshells/threads. If your machine has at least two CPU threads, you will be able to max-out CPU resources using multi-threaded scripting in Bash. The reason for this is simple; as soon as a secondary ‘thread’ (read: subshell) is started, then that subsequent thread can (and often will) use a different CPU thread.

Assume for a moment that you have a modern machine with 8 or more threads. Can you start seeing how if we would be able to execute code – eight parallel threads all at the same time, each running on a different CPU thread (or shared across all threads) – this way it would execute much faster then a single-threaded process running on a single CPU thread (which may be co-shared with other running processes)? The gains realized will depend a bit on what is being executed, but gains there will be, almost always!

Excited? Great. Let’s dive into it.

First we need to understand what a subshell is, how it is started, why you would use one, and how it can be used to implement multi-threaded Bash code.

A subshell is another Bash client process executed/started from within the current one. Let’s do something easy, and start one from within an opened Bash terminal prompt:

$ bash
$ exit
exit
$

What happened here? First we started another Bash shell (bash) which started and in turn yielded a command prompt ($). So the second $ in the example above is actually a different Bash shell, with a different PID (PID is the process identifier; a unique number identifier which uniquely identifies each running process in an operating system). Finally we exited from the subshell via exit and returned to the parent subshell! Can we somehow proof this is really what happened? Yes:

$ echo $
220250
$ bash
$ echo $
222629
$ exit
exit
$ echo $
220250
$

There is a special variable in bash $$, which contains the PID of the current shell in use. Can you see how the process identifier changed once we were inside a subshell?

Great! Now that we know what subshells are, and a little about how they work, let’s dive into some multi-threaded coding examples and learn more!

Simple multi-threading in Bash

Let us start with a simple one-liner multi-threaded example, of which the output may look somewhat confusing at first:

$ for i in $(seq 1 2); do echo $i; done
1
2
$ for i in $(seq 1 2); do echo $i & done
[1] 223561
1
[2] 223562
$ 2

[1]-  Done                    echo $i
[2]+  Done                    echo $i
$

In the first for loop (see our article on Bash loops to learn how to code loops
), we simply output the variable $i which will range from 1 to 2 (due to our use of the seq command), which – interestingly – is started in a subshell!

NOTE
You can use the $(...) syntax anywhere within a command line to start a subshell: it is a very powerful and versatile way to code subshells directly into other command lines!

In the second for loop, we have changed only one character. Instead of using ; – an EOL (end of line) Bash syntax idiom which terminates a given command (you may think about it like Enter/Execute/Go ahead), we used &. This simple change makes for an almost completely different program, and our code is now multi-threaded! Both echo’s will process more or less at the same time, with a small delay in the operating system still having to execute the second loop run (to echo ‘2’).

You can think about & in a similar way to ; with the difference that & will tell the operating system to ‘keep running the next command, keep processing the code’ whereas ; will wait for the current executing command (terminated by ;) to terminate/finish before returning to the command prompt / before continuing to process and execute the next code.

Let’s now examine the output. We see:

[1] 223561
1
[2] 223562
$ 2

At first, followed by:

[1]-  Done                    echo $i
[2]+  Done                    echo $i
$

And there is also an empty line in between, which is the result of background processes still running whilst waiting for the next command input (try this command a few times at the command line, as well as some light variations, and you will get a feel how this works).

The first output ([1] 223561) shows us that a background process was started, with PID 223561 and the identifier number 1 was given to it. Then, already before the script reached the second echo (an echo likely being an expensive code statement to run), the output 1 was shown.

Our background process did not finish completely as the next output indicates we started a second subshell/thread (as indicated by [2]) with PID 223562. Subsequently the second process outputs the 2 (“indicatively”: OS mechanisms may affect this) before the second thread finalizes.

Finally, in the second block of output, we see the two processes terminating (as indicated by Done), as well as what they were executing last (as indicated by echo $i). Note that the same numbers 1 and 2 are used to indicate the background processes.

More multi-threading in Bash

Next, let’s execute three sleep commands, all terminated by & (so they start as background processes), and let us vary their sleep duration lengths, so we can more clearly see how background processing works.

$ sleep 10 & sleep 1 & sleep 5 &
[1] 7129
[2] 7130
[3] 7131
$
[2]-  Done                    sleep 1
$
[3]+  Done                    sleep 5
$
[1]+  Done                    sleep 10

The output in this case should be self-explanatory. The command line immediately returns after our sleep 10 & sleep 1 & sleep 5 & command, and 3 background processes, with their respective PID’s are shown. I hit enter a few times in between. After 1 second the first command completed yielding the Done for process identifier [2]. Subsequently the third and first process terminated, according to their respective sleep durations. Also note that this example show clearly that multiple jobs are effectively running, simultaneously, in the background.

You may have also picked up the + sign in the output examples above. This is all about job control. We will look at job control in the next example, but for the moment it’s important to understand that + indicates is the job which will be controlled if we were to use/execute job control commands. It is always the job which was added to the list of running jobs most recently. This is the default job, which is always the one most recently added to the list of jobs.

A - indicates the job which would become the next default for job control commands if the current job (the job with the + sign) would terminate. Job control (or in other words; background thread handling) may sound a bit daunting at first, but it is actually very handy and easy to use once you get used to it. Let’s dive in!

Job control in Bash

$ sleep 10 & sleep 5 &
[1] 7468
[2] 7469
$ jobs
[1]-  Running                 sleep 10 &
[2]+  Running                 sleep 5 &
$ fg 2
sleep 5
$ fg 1
sleep 10
$

Here we placed two sleeps in the background. Once they were started, we examined the currently running jobs by using the jobs command. Next, the second thread was placed into the foreground by using the fg command followed by the job number. You can think about it like this; the & in the sleep 5 command was turned into a ;. In other words, a background process (not waited upon) became a foreground process.

We then waited for the sleep 5 command to finalize and subsequently placed the sleep 10 command into the foreground. Note that each time we did this we had to wait for the foreground process to finish before we would receive our command line back, which is not the case when using only background processes (as they are literally ‘running in the background’).

Job control in Bash: job interruption

$ sleep 10
^Z
[1]+  Stopped                 sleep 10
$ bg 1
[1]+ sleep 10 &
$ fg 1
sleep 10
$

Here we press CTRL+z to interrupt a running sleep 10 (which stops as indicated by Stopped). We then place the process into the background and finally placed it into the foreground and wait for it to finish.

Job control in Bash: job interruption

$ sleep 100
^Z
[1]+  Stopped                 sleep 100
$ kill %1
$
[1]+  Terminated              sleep 100

Having started a 100 second sleep, we next interrupt the running process by CTRL+z, and then kill the first started/running background process by using the kill command. Note how we use %1 in this case, instead of simply 1. This is because we are now working with a utility which is not natively tied to background processes, like fg and bg are. Thus, to indicate to kill that we want to effect the first background process, we use % followed by the background process number.

Job control in Bash: process disown

$ sleep 100
^Z
[1]+  Stopped                 sleep 100
$ bg %1
[1]+ sleep 100 &
$ disown

In this final example, we again terminate a running sleep, and place it into the background. Finally we execute the disown command which you can read as: disassociate all background processes (jobs) from the current shell. They will keep running, but are no longer ‘owned’ by the current shell. Even if you close your current shell and logout, these processes will keep running until they naturally terminate.

This is a very powerful way to interrupt a process, place it into the background, disown it and then logout from the machine you were using, provided you will not need to interact with the process anymore. Ideal for those long running processes over SSH which cannot be interrupted. Simply CTRL+z the process (which temporarily interrupts it), place it into the background, disown all jobs, and logout! Go home and have a nice relaxed evening knowing your job will keep running!

Multi-threaded Bash scripting & process management command line examples

Multi-threaded Bash scripting & process management command line examples

Conclusion

In this tutorial we saw how to implement multi-threaded Bash one-liners directly from the command line, and explored why multi-threaded coding often increases the performance of your scripts. We also examined how background and foreground processes work, and we manipulated job queues. Finally we explored how to disown our job queue from the current process, providing us with additional control over running processes. Enjoy your new found skills, and leave us a comment below with your job control experiences!



Comments and Discussions
Linux Forum