How to launch external processes with Python and the subprocess module

In our automation scripts we often need to launch and monitor external programs to accomplish our desired tasks. When working with Python, we can use the subprocess module to perform said operations. This module is part of the programming language standard library. In this tutorial we will take a quick look at it, and we will learn the basics of its usage.

In this tutorial you will learn:

How to use the “run” function to spawn an external process
How to capture a process standard output and standard error
How to check the exist status of a process and raise an exception if it fails
How to execute a process into an intermediary shell
How to set a timeout for a process
How to use the Popen class directly to pipe two processes

How to launch external processes with Python and the subprocess module

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category	Requirements, Conventions or Software Version Used
System	Distribution independent
Software	Python3
Other	Knowledge of Python and Object Oriented Programming
Conventions	# – requires given linux-commands to be executed with root privileges either directly as a root user or by use of `sudo` command $ – requires given linux-commands to be executed as a regular non-privileged user

The “run” function

The run function has been added to the subprocess module only in relatively recent versions of Python (3.5). Using it is now the recommended way to spawn processes and should cover the most common use cases. Before everything else, let’s see its most simple usage. Suppose we want to run the ls -al command; in a Python shell we would run:

>>> import subprocess
>>> process = subprocess.run(['ls', '-l', '-a'])

The output of the external command is displayed onscreen:

total 132
drwx------. 22 egdoc egdoc  4096 Nov 30 12:18 .
drwxr-xr-x.  4 root  root   4096 Nov 22 13:11 ..
-rw-------.  1 egdoc egdoc 10438 Dec  1 12:54 .bash_history
-rw-r--r--.  1 egdoc egdoc    18 Jul 27 15:10 .bash_logout
[...]

Here we just used the first, mandatory argument accepted by the function, which can be a sequence which “describes” a command and its arguments (as in the example) or a string, which should be used when running with the shell=True argument (we will see it later).

Capturing the command stdout and stderr

What if we don’t want the output of the process to be displayed onscreen, but instead captured, so it can be referenced after the process exits? In that case we can set the capture_output argument of the function to True:

>>> process = subprocess.run(['ls', '-l', '-a'], capture_output=True)

How can we retrieve the output (stdout and stderr) of the process afterwards? If you observe the examples above, you can see we used the process variable to reference what is returned by the run function: a CompletedProcess object. This object represents the process which was launched by the function and has many useful properties. Among the others, stdout and stderr are used to “store” the corresponding descriptors of the command if, as we said, the capture_output argument is set to True. In this case, to get the stdout of the process we would run:

>>> process.stdout

Stdout and stderr are stored as bytes sequences by default. If we want them to be stored as strings, we must set the text argument of the run function to True.

Manage a process failure

The command we ran in the previous examples was executed without errors. When writing a program, however, all cases should be take into account, so what if a spawned process fails? By default nothing “special” would happen. Let’s see an example; we run the ls command again, trying to list the content of the /root directory, which normally, on Linux is not readable by normal users:

>>> process = subprocess.run(['ls', '-l', '-a', '/root'])

One thing we can do to check if a launched process failed, is to check its exist status, which is stored in the returncode property of the CompletedProcess object:

>>> process.returncode
2

See? In this case the returncode was 2, confirming that the process encountered a permission problem, and was not completed successfully. We could test the output of a process this way, or more elegantly we could make so that an exception is raised when a fail happens. Enter the check argument of the run function: when it is set to True and a spawned process fails, the CalledProcessError exception is raised:

>>> process = subprocess.run(['ls', '-l', '-a', '/root'], check=True)
ls: cannot open directory '/root': Permission denied
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ls', '-l', '-a', '/root']' returned non-zero exit status 2.

Handling exceptions in Python is quite easy, so to manage a process failure we could write something like:

>>> try:
...   process = subprocess.run(['ls', '-l', '-a', '/root'], check=True)
... except subprocess.CalledProcessError as e:
...   # Just an example, something useful to manage the failure should be done!
...   print(f"{e.cmd} failed!")
...
ls: cannot open directory '/root': Permission denied
['ls', '-l', '-a', '/root'] failed!
>>>

The CalledProcessError exception, as we said, is raised when a process exits with a non 0 status. The object has properties like returncode, cmd, stdout, stderr; what they represent is pretty obvious. In the example above, for instance, we just used the cmd property, to report the sequence that was used to describe the command and its arguments in the message we wrote when the exception occurred.

Execute a process in a shell

The processes launched with the run function, are executed “directly”, this means that no shell is used to launch them: no environment variables are therefore available to the process and shell expansions are not performed. Let’s see an example which involves the use of the $HOME variable:

>>> process = subprocess.run(['ls', '-al', '$HOME'])
ls: cannot access '$HOME': No such file or directory

As you can see the $HOME variable was not expanded. Executing processes this way is recommended so to avoid potential security risks. If in certain cases, however, we need to invoke a shell as an intermediate process we need to set the shell parameter of the run function to True. In such cases it is preferable to specify the command to be executed and its arguments as a string:

>>> process = subprocess.run('ls -al $HOME', shell=True)
total 136
drwx------. 23 egdoc egdoc  4096 Dec  3 09:35 .
drwxr-xr-x.  4 root  root   4096 Nov 22 13:11 ..
-rw-------.  1 egdoc egdoc 11885 Dec  3 09:35 .bash_history
-rw-r--r--.  1 egdoc egdoc    18 Jul 27 15:10 .bash_logout
[...]

All the variables existing in the user environment can be used when invoking a shell as an intermediate process: while this can look handy, it can be a source of troubles, especially when dealing with potentially dangerous input, which could lead to shell injections. Running a process with shell=True is therefore discouraged, and should be used only in safe cases.

Specifying a timeout for a process

We usually don’t want misbehaving processes to run forever on our system once they are launched. If we use the timeout parameter of the run function, we can specify an amount of time in seconds the process should take to complete. If it is not completed in that amount of time, the process will be killed with a SIGKILL signal, which, as we know, cannot be caught by a process. Let’s demonstrate it by spawning a long running process and providing a timeout in seconds:

>>> process = subprocess.run(['ping', 'google.com'], timeout=5)
PING google.com (216.58.206.46) 56(84) bytes of data.
64 bytes from mil07s07-in-f14.1e100.net (216.58.206.46): icmp_seq=1 ttl=113 time=29.3 ms
64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=2 ttl=113 time=28.3 ms
64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=3 ttl=113 time=28.5 ms
64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=4 ttl=113 time=28.5 ms
64 bytes from lhr35s10-in-f14.1e100.net (216.58.206.46): icmp_seq=5 ttl=113 time=28.1 ms
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/subprocess.py", line 503, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1130, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 2003, in _communicate
    self.wait(timeout=self._remaining_time(endtime))
  File "/usr/lib64/python3.9/subprocess.py", line 1185, in wait
    return self._wait(timeout=timeout)
  File "/usr/lib64/python3.9/subprocess.py", line 1907, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['ping', 'google.com']' timed out after 4.999826977029443 seconds

In the example above we launched the ping command without specifying a fixed amount of ECHO REQUEST packets, therefore it could potentially run forever. We also specified a timeout of 5 seconds via the timeout parameter. As we can observe the program initially ran, but the TimeoutExpired exception was raised when the specified amount of seconds was reached, and the process was killed.

The call, check_output and check_call functions

As we said before, the run function is the recommended way to run an external process and should cover the majority of cases. Before it was introduced in Python 3.5, the three main high level API functions used to launch a process were call, check_output and check_call; let’s see them briefly.

First of all, the call function: it is used to run the command described by the args parameter; it waits for the command to be completed and returns its returncode. It roughly corresponds to the basic usage of the run function.

The check_call function behavior is practically the same of that of the run function when the check parameter is set to True: it runs the specified command and waits for it to complete. If its exist status is not 0, a CalledProcessError exception is raised.

Finally, the check_output function: it works similarly to check_call, but returns the program output: it is not displayed when the function is executed.

Working at a lower level with the Popen class

Until now we explored the high level API functions in the subprocess module, especially run. All this functions, under the hood interact with the Popen class. Because of this, in the vast majority of cases we don’t have to work with it directly. When more flexibility is needed, however, creating Popen objects directly becomes necessary.

Suppose, for example, we want to connect two processes, recreating the behavior of a shell “pipe”. As we know, when we pipe two commands in the shell, the standard output of the one at the left side of the pipe (|) is used as the standard input of the one at the right of it (check this article about shell redirections if you want to know more on the subject). In the example below the result of piping the two commands is stored in a variable:

$ output="$(dmesg | grep sda)"

To emulate this behavior using the subprocess module, without having to set the shell parameter to True as we saw before, we must use the Popen class directly:

dmesg = subprocess.Popen(['dmesg'], stdout=subprocess.PIPE)
grep = subprocess.Popen(['grep', 'sda'], stdin=dmesg.stdout)
dmesg.stdout.close()
output = grep.comunicate()[0]

To understand the example above we must remember that a process started by using the Popen class directly does not block the execution of the script, since it is now waited for.

The first thing we did in the code snippet above, was to create the Popen object representing the dmesg process. We set the stdout of this process to subprocess.PIPE: this value indicates that a pipe to the specified stream should be opened.

We than created another instance of the Popen class for the grep process. In the Popen constructor we specified the command and its arguments, of course, but, here is the important part, we set the standard output of the dmesg process to be used as the standard input (stdin=dmesg.stdout), so to recreate the shell
pipe behavior.

After creating the Popen object for the grep command, we closed the stdout stream of the dmesg process, using the close() method: this, as stated in the documentation, is needed to allow the first process to receive a SIGPIPE signal. Let’s try to explain why. Normally, when two processes are connected by a pipe, if the one at the right of the pipe (grep in our example) exits before the one at the left (dmesg), the latter receives a SIGPIPE
signal (broken pipe) and by default, terminates itself.

When replicating the behavior of a pipe between two commands in Python, however, there is a problem: the stdout of the first process is opened both in the parent script and in the standard input of the other process. This way, even if the grep process ends, the pipe will still remain open in the caller process (our script), therefore the first process will never receive the SIGPIPE signal. This is why we need to close the stdout stream of the first process in our
main script after we launch the second one.

The last thing we did was to call the communicate() method on the grep object. This method can be used to optionally pass input to a process; it waits for the process to terminate and returns a tuple where the first member is the process stdout (which is referenced by the output variable) and the second the process stderr.

Conclusions

In this tutorial we saw the recommended way to spawn external processes with Python using the subprocess module and the run function. The use of this function should be enough for the majority of cases; when an higher level of flexibility is needed, however, one must use the Popen class directly. As always, we suggest to take a look at the
subprocess documentation for a complete overview of the signature of functions and classes available in
the module.