In our automation scripts we often need to launch and monitor external programs to accomplish our desired tasks. When working with Python, we can use the subprocess module to perform said operations. This module is part of the programming language standard library. In this tutorial we will take a quick look at it, and we will learn the basics of its usage.
In this tutorial you will learn:
- How to use the “run” function to spawn an external process
- How to capture a process standard output and standard error
- How to check the exist status of a process and raise an exception if it fails
- How to execute a process into an intermediary shell
- How to set a timeout for a process
- How to use the Popen class directly to pipe two processes
Software requirements and conventions used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Distribution independent |
Software | Python3 |
Other | Knowledge of Python and Object Oriented Programming |
Conventions | # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux-commands to be executed as a regular non-privileged user |
The “run” function
The run function has been added to the subprocess module only in relatively recent versions of Python (3.5). Using it is now the recommended way to spawn processes and should cover the most common use cases. Before everything else, let’s see its most simple usage. Suppose we want to run the ls -al
command; in a Python shell we would run:
>>> import subprocess >>> process = subprocess.run(['ls', '-l', '-a'])
The output of the external command is displayed onscreen:
total 132 drwx------. 22 egdoc egdoc 4096 Nov 30 12:18 . drwxr-xr-x. 4 root root 4096 Nov 22 13:11 .. -rw-------. 1 egdoc egdoc 10438 Dec 1 12:54 .bash_history -rw-r--r--. 1 egdoc egdoc 18 Jul 27 15:10 .bash_logout [...]
Here we just used the first, mandatory argument accepted by the function, which can be a sequence which “describes” a command and its arguments (as in the example) or a string, which should be used when running with the shell=True
argument (we will see it later).
Capturing the command stdout and stderr
What if we don’t want the output of the process to be displayed onscreen, but instead captured, so it can be referenced after the process exits? In that case we can set the capture_output
argument of the function to True
:
>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)
How can we retrieve the output (stdout and stderr) of the process afterwards? If you observe the examples above, you can see we used the process
variable to reference what is returned by the run
function: a CompletedProcess
object. This object represents the process which was launched by the function and has many useful properties. Among the others, stdout
and stderr
are used to “store” the corresponding descriptors of the command if, as we said, the capture_output
argument is set to True
. In this case, to get the stdout
of the process we would run:
>>> process.stdout
Stdout and stderr are stored as bytes sequences by default. If we want them to be stored as strings, we must set the text
argument of the run
function to True
.
Manage a process failure
The command we ran in the previous examples was executed without errors. When writing a program, however, all cases should be take into account, so what if a spawned process fails? By default nothing “special” would happen. Let’s see an example; we run the ls
command again, trying to list the content of the /root
directory, which normally, on Linux is not readable by normal users:
>>> process = subprocess.run(['ls', '-l', '-a', '/root'])
One thing we can do to check if a launched process failed, is to check its exist status, which is stored in the returncode
property of the CompletedProcess
object:
>>> process.returncode 2
See? In this case the returncode was 2
, confirming that the process encountered a permission problem, and was not completed successfully. We could test the output of a process this way, or more elegantly we could make so that an exception is raised when a fail happens. Enter the check
argument of the run
function: when it is set to True
and a spawned process fails, the CalledProcessError
exception is raised:
>>> process = subprocess.run(['ls', '-l', '-a', '/root'], check=True) ls: cannot open directory '/root': Permission denied Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.9/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ls', '-l', '-a', '/root']' returned non-zero exit status 2.
Handling exceptions in Python is quite easy, so to manage a process failure we could write something like:
>>> try: ... process = subprocess.run(['ls', '-l', '-a', '/root'], check=True) ... except subprocess.CalledProcessError as e: ... # Just an example, something useful to manage the failure should be done! ... print(f"{e.cmd} failed!") ... ls: cannot open directory '/root': Permission denied ['ls', '-l', '-a', '/root'] failed! >>>
The CalledProcessError
exception, as we said, is raised when a process exits with a non 0
status. The object has properties like returncode
, cmd
, stdout
, stderr
; what they represent is pretty obvious. In the example above, for instance, we just used the cmd
property, to report the sequence that was used to describe the command and its arguments in the message we wrote when the exception occurred.
Execute a process in a shell
The processes launched with the run
function, are executed “directly”, this means that no shell is used to launch them: no environment variables are therefore available to the process and shell expansions are not performed. Let’s see an example which involves the use of the $HOME
variable:
>>> process = subprocess.run(['ls', '-al', '$HOME']) ls: cannot access '$HOME': No such file or directory
As you can see the $HOME
variable was not expanded. Executing processes this way is recommended so to avoid potential security risks. If in certain cases, however, we need to invoke a shell as an intermediate process we need to set the shell
parameter of the run
function to True
. In such cases it is preferable to specify the command to be executed and its arguments as a string:
>>> process = subprocess.run('ls -al $HOME', shell=True) total 136 drwx------. 23 egdoc egdoc 4096 Dec 3 09:35 . drwxr-xr-x. 4 root root 4096 Nov 22 13:11 .. -rw-------. 1 egdoc egdoc 11885 Dec 3 09:35 .bash_history -rw-r--r--. 1 egdoc egdoc 18 Jul 27 15:10 .bash_logout [...]
All the variables existing in the user environment can be used when invoking a shell as an intermediate process: while this can look handy, it can be a source of troubles, especially when dealing with potentially dangerous input, which could lead to shell injections. Running a process with shell=True
is therefore discouraged, and should be used only in safe cases.
Specifying a timeout for a process
We usually don’t want misbehaving processes to run forever on our system once they are launched. If we use the timeout
parameter of the run
function, we can specify an amount of time in seconds the process should take to complete. If it is not completed in that amount of time, the process will be killed with a SIGKILL signal, which, as we know, cannot be caught by a process. Let’s demonstrate it by spawning a long running process and providing a timeout in seconds:
>>> process = subprocess.run(['ping', 'google.com'], timeout=5) PING google.com (216.58.206.46) 56(84) bytes of data. 64 bytes from mil07s07-in-f14.1e100.net (216.58.206.46): icmp_seq=1 ttl=113 time=29.3 ms 64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=2 ttl=113 time=28.3 ms 64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=3 ttl=113 time=28.5 ms 64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=4 ttl=113 time=28.5 ms 64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=5 ttl=113 time=28.1 ms Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.9/subprocess.py", line 503, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/usr/lib64/python3.9/subprocess.py", line 1130, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "/usr/lib64/python3.9/subprocess.py", line 2003, in _communicate self.wait(timeout=self._remaining_time(endtime)) File "/usr/lib64/python3.9/subprocess.py", line 1185, in wait return self._wait(timeout=timeout) File "/usr/lib64/python3.9/subprocess.py", line 1907, in _wait raise TimeoutExpired(self.args, timeout) subprocess.TimeoutExpired: Command '['ping', 'google.com']' timed out after 4.999826977029443 seconds
In the example above we launched the ping
command without specifying a fixed amount of ECHO REQUEST packets, therefore it could potentially run forever. We also specified a timeout of 5
seconds via the timeout
parameter. As we can observe the program initially ran, but the TimeoutExpired
exception was raised when the specified amount of seconds was reached, and the process was killed.
The call, check_output and check_call functions
As we said before, the run
function is the recommended way to run an external process and should cover the majority of cases. Before it was introduced in Python 3.5, the three main high level API functions used to launch a process were call
, check_output
and check_call
; let’s see them briefly.
First of all, the call
function: it is used to run the command described by the args
parameter; it waits for the command to be completed and returns its returncode. It roughly corresponds to the basic usage of the run
function.
The check_call
function behavior is practically the same of that of the run
function when the check
parameter is set to True
: it runs the specified command and waits for it to complete. If its exist status is not 0
, a CalledProcessError
exception is raised.
Finally, the check_output
function: it works similarly to check_call
, but returns the program output: it is not displayed when the function is executed.
Working at a lower level with the Popen class
Until now we explored the high level API functions in the subprocess module, especially run
. All this functions, under the hood interact with the Popen
class. Because of this, in the vast majority of cases we don’t have to work with it directly. When more flexibility is needed, however, creating Popen
objects directly becomes necessary.
Suppose, for example, we want to connect two processes, recreating the behavior of a shell “pipe”. As we know, when we pipe two commands in the shell, the standard output of the one at the left side of the pipe (|
) is used as the standard input of the one at the right of it (check this article about shell redirections if you want to know more on the subject). In the example below the result of piping the two commands is stored in a variable:
$ output="$(dmesg | grep sda)"
To emulate this behavior using the subprocess module, without having to set the shell
parameter to True
as we saw before, we must use the Popen
class directly:
dmesg = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE) grep = subprocess.Popen(['grep', 'sda'], stdin=dmesg.stdout) dmesg.stdout.close() output = grep.comunicate()[0]
To understand the example above we must remember that a process started by using the Popen
class directly does not block the execution of the script, since it is now waited for.
The first thing we did in the code snippet above, was to create the Popen
object representing the dmesg process. We set the stdout
of this process to subprocess.PIPE
: this value indicates that a pipe to the specified stream should be opened.
We than created another instance of the Popen
class for the grep process. In the Popen
constructor we specified the command and its arguments, of course, but, here is the important part, we set the standard output of the dmesg process to be used as the standard input (stdin=dmesg.stdout
), so to recreate the shell
pipe behavior.
After creating the Popen
object for the grep command, we closed the stdout
stream of the dmesg process, using the close()
method: this, as stated in the documentation, is needed to allow the first process to receive a SIGPIPE signal. Let’s try to explain why. Normally, when two processes are connected by a pipe, if the one at the right of the pipe (grep in our example) exits before the one at the left (dmesg), the latter receives a SIGPIPE
signal (broken pipe) and by default, terminates itself.
When replicating the behavior of a pipe between two commands in Python, however, there is a problem: the stdout of the first process is opened both in the parent script and in the standard input of the other process. This way, even if the grep process ends, the pipe will still remain open in the caller process (our script), therefore the first process will never receive the SIGPIPE signal. This is why we need to close the stdout stream of the first process in our
main script after we launch the second one.
The last thing we did was to call the communicate()
method on the grep object. This method can be used to optionally pass input to a process; it waits for the process to terminate and returns a tuple where the first member is the process stdout (which is referenced by the output
variable) and the second the process stderr.
Conclusions
In this tutorial we saw the recommended way to spawn external processes with Python using the subprocess module and the run
function. The use of this function should be enough for the majority of cases; when an higher level of flexibility is needed, however, one must use the Popen
class directly. As always, we suggest to take a look at the
subprocess documentation for a complete overview of the signature of functions and classes available in
the module.