If you are new to xargs
, or do not know what xargs
is yet, please read our xargs for beginners with examples first. If you are already somewhat used to xargs
, and can write basic xargs
command line statements without looking at the manual, then this article will help you to become more advanced with xargs
on the command line, especially by making it multi-threaded.
In this tutorial you will learn:
- How to use
xargs
-P (multi-threaded mode) from the command line in Bash - Advanced usage examples using multi-threaded
xargs
from the command line in Bash - A deeper understanding of how to apply
xargs
multi-threaded to your existing Bash code
Software requirements and conventions used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Linux Distribution-independent |
Software | Bash command line, Linux based system |
Other | The xargs utility is included in the Bash shell by default |
Conventions | # – requires linux-commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires linux-commands to be executed as a regular non-privileged user |
Example 1: Calling another Bash shell with xargs compiled input
After one uses to learn xargs
, he or she will soon find that – whereas xargs
allows one to do many powerful things by itself – the power of xargs
seems to be limited by it’s inability to execute multiple commands in sequence.
For example, let’s say we have a directory which has subdirectories named 00
to 10
(11 in total). And, for each of these subdirectories, we want to traverse into it, and check if a file named file.txt
exists, and if so cat
(and merge using >>
) the contents of this file to a file total_file.txt
in the directory where the 00
to 10
directories are. Let’s try and do this with xargs
in various steps:
$ mkdir 00 01 02 03 04 05 06 07 08 09 10 $ ls 00 01 02 03 04 05 06 07 08 09 10 $ echo 'a' > 03/file.txt $ echo 'b' > 07/file.txt $ echo 'c' > 10/file.txt
Here we first create 11 directories, 00
to 10
and next create 3 sample file.txt
files in the subdirectories 03
, 07
and 10
.
$ find . -maxdepth 2 -type f -name file.txt ./10/file.txt ./07/file.txt ./03/file.txt
We then write a find
command to locate all file.txt
files starting at the current directory (.
) and that up to a maximum of 1 level of subdirectories:
$ find . -maxdepth 2 -type f -name file.txt | xargs -I{} cat {} > ./total_file.txt $ cat total_file.txt c b a
The -maxdepth 2
indicates the current directory (1) and all subdirectories of this directory (hence the maxdepth
of 2).
Finally we use xargs
(with the recommended and preferred {}
replacement string as passed to the xargs -I
replace string option) to cat the contents of any such file located by the find
command into a file in the current directory named total_file.txt
.
Something nice to note here is that, even though one would think about xargs
as subsequently executing multiple cat
commands all redirecting to the same file, one can use >
(output to new file, creating the file if it does not exist yet, and overwriting any file with the same name already there) instead of >>
(append to a file, and create the file if not existing yet)!
The exercise so far sort of fulfilled our requirements, but it did not match the requirement exactly – namely, it does not traverse into the subdirectories. It also did not use the >>
redirection as specified, though using that in this case would still have worked.
The challenge with running multiple commands (like the specific cd
command required to change directory/traverse into the subdirectory) from within xargs
is that 1) they are very hard to code, and 2) it may not be possible to code this at all.
There is however a different and easy to understand way to code this, and once you know how to do this, you will likely be using this in plenty. Let’s dive in.
$ rm total_file.txt
We first cleaned up our previous output.
$ ls -d --color=never [0-9][0-9] | xargs -I{} echo 'cd {}; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi' cd 00; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 01; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 02; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 03; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 04; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 05; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 06; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 07; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 08; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 09; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi cd 10; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi
Next, we formulated a command, this time using ls
which will list all directories which correspond to the [0-9][0-9]
regular expression (Read our Advanced Bash regex with examples article for more information on regular expressions).
We also used xargs
, but this time (in comparison with previous examples) with an echo
command which will output exactly what we would like to do, even if it requires more then one or many commands. Think about this like a mini-script.
We also use cd {}
to change into directories as listed by the ls -d
(directories only) command (which as a side note is protected by the --color=never
clause preventing any color codes in the ls
output from skewing our results), and check if the file file.txt
is there in the subdirectory by using an if [ -r ...
command. If it exists, we cat
the file.txt
into ../total_file.txt
. Note the ..
as the cd {}
in the command has placed us into the subdirectory!
We run this to see how it works (after all, only the echo
is executed; nothing will actually happen). The code generated looks great. Let’s take it one step further now and actually execute the same:
$ ls -d --color=never [0-9][0-9] | xargs -I{} echo 'cd {}; if [ -r ./file.txt ]; then cat file.txt >> ../total_file.txt; fi' | xargs -I{} bash -c "{}" $ cat total_file.txt a b c
We now executed the total script by using a specific (and always the same, i.e. you will find yourself writing | xargs -I{} bash -c "{}"
with some regularity) command, which executes whatever was generated by the echo
preceding it: xargs -I{} bash -c "{}"
. Basically this is telling the Bash interpreter to execute whatever was passed to it – and this for any code generated. Very powerful!
Example 2: Multi-threaded xargs
Here we will have a look at two different xargs
commands, one executed without parallel (multi-threaded) execution, the other with. Consider the difference between the following two examples:
$ time for i in $(seq 1 5); do echo $[$RANDOM % 5 + 1]; done | xargs -I{} echo "sleep {}; echo 'Done! {}'" | xargs -I{} bash -c "{}" Done! 5 Done! 5 Done! 2 Done! 4 Done! 1 real 0m17.016s user 0m0.017s sys 0m0.003s
$ time for i in $(seq 1 5); do echo $[$RANDOM % 5 + 1]; done | xargs -I{} echo "sleep {}; echo 'Done! {}'" | xargs -P5 -I{} bash -c "{}" Done! 1 Done! 3 Done! 3 Done! 3 Done! 5 real 0m5.019s user 0m0.036s sys 0m0.015s
The difference between the actual two command lines is small; we only added -P5
in the second command line. The runtime however (as measured by the time
command prefix) is significant. Let’s find out why (and why the output differs!).
In the first example, we create a for
loop which will run 5 times (due to the subshell $(seq 1 5)
generating numbers from 1
to 5
) and in it we echo a random number between 1 and 5. Next, much in line with out last example, we sent this output into the sleep command, and also output the duration slept as part of the Done! echo
. Finally we sent this to be run by a subshell Bash command, again in a similar fashion to our last example.
The output of the first command works like this; execute a sleep, output result, execute the next sleep, and so on.
The second command however completely changes this. Here we added -P5
which basically starts 5 parallel threads all at once!
The way that this command works is: start up to x threads (as defined by the -P option) and process them simultaneously. When a thread is complete, grab new input immediately, do not wait for other threads to finish first. The latter part of that description is not applicable here (it only would be if there were less threads specified by -P
then the number of ‘lines’ of input given, or in other words less parallel threads would be available then number of rows of input).
The result is that the threads which finish first – those with a short random sleep time – come back first, and output their ‘Done!’ statement. The total runtime also comes down from about 17 seconds to just about 5 seconds exactly in real clock time. Cool!
Conclusion
Using xargs
is one of the most advanced, and also one of the most powerful, ways to code in Bash. But it doesn’t stop at just using xargs
! In this article we thus explored multi-threaded parallel execution via the -P
option to xargs
. We also looked at calling subshells using $()
and finally we introduced a method to pass multi-command statements directly to xargs
by using a bash -c
subshell call.
Powerful? We think so! Leave us your thoughts.