Process List Management and Automatic Process Termination

As sever optimal use/maximization continues to grow, it becomes more and more important to manage processes well. One aspect of this is automatic process termination. When a process has gone rogue, and is consuming too much resources, it can be terminated automatically.

This is especially suited to servers which have a lot of temporary or disposable processes. It is also well suited for testing servers which are running many test trials and where such test trials prove to be unstable or cause the software under testing to behave erratically (for example by using too much memory)

In this tutorial you will learn:

  • How to manage processes in an automated fashion
  • Which resources you may want to monitor, and why
  • Example code showing how automatic process termination can work for memory hogging issues

Process List Management and Automatic Process Termination

Process List Management and Automatic Process Termination

Software requirements and conventions used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Linux Distribution-independent
Software Bash command line, Linux based system
Other Any utility which is not included in the Bash shell by default can be installed using sudo apt-get install utility-name (or yum install for RedHat based systems)
Conventions # – requires linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires linux-commands to be executed as a regular non-privileged user

Too Much Memory! Or better, Too Little Memory!

There are two main resources you likely always want to keep an eye on, depending on the software which is being used elsewhere on the server, and that is memory usage and disk space. CPU Usage may also come into the picture, but it is somewhat different from the others. The reason is that – when you run out of disk space, or out of memory space, your server will start playing up.

You may get undefined behavior due to running out of disk space, and if you run out of memory, the OOM Killer (Out of Memory automatic process kill engine) may kick in and kill off some processes, and so on.

On the other hand, with the CPU, even if the software running elsewhere on the server maxes out the CPU, your server will keep running. If it is a real CPU hogging program, it may be prohibitively slow, but in most cases you will still be able to at least type some commands.

This article will focus on memory hogging process management: automatic termination of processes which consume too much memory. Let’s first look at how we can monitor memory process usage:

$ ps -eo pid,pmem --sort -rss | head -n10
    PID %MEM
 406677 19.5
 373013  2.1
 406515  2.0
 406421  1.9
   2254  1.8
 406654  1.8
 406554  1.7
 406643  0.9
  16622  0.7


Here we requested ps to produce a list of the top 10 PID’s. We indicated that we want to see all processes (-e), and for each process we want to see the process ID (-o pid), and the percentage of memory that it consumes (-o pmem), or in total (with options combined: -eo pid,pmem).

Next we requested the list to be presorted for us (--sort) and set the rss option (-rss) as the long format specification. We then capture the top 10 results by using head -n10. If we wanted to see what processes are using the memory, we can also add ,comm to the pid,pmem list, or we simply use ps -ef | grep PID where PID is the number as listed in the first column of the ps output to see the full details for a process.

Now let’s automate this in such a way that the processes which use more then 10% of memory are automatically terminated.

WARNING: Do not run this on any computer without fully understanding what it will do, and how this works. Information here is provided as-is, without warranties of any kind. You may terminate some processes which you did not want, or should not be, terminated.

# ps -eo pmem,pid --sort -rss | grep '^[ \t]*[1-9][0-9]\.' | awk '{print $2}' | xargs -I{} kill -9 {}

Firstly, we are going to execute this as root, to ensure we have enough privileges to kill any relevant process. Note that we swapped the pmem (percent memory) and pid (process ID) around. This makes it a bit easier to use a regular expression grep. Our grep regular expression works like this: first, look for a space () or ([...]) a tab (\t), zero or more (*) times.

Next, look for the number 1 to 9, at least once (at least once is the default frequency, so no symbol similar to * is used!). This is to capture any number from 10 (starts with 1) to 99 (starts with 9). Next we look for another 0 to 9, so in total we are searching/grepping for the numbers 10 to 99. We follow this by a literal dot (\., do not use . here as a single dot without prefix backslash means any character rather than a literal dot!) to make sure we are capturing before the decimal point only.

We then take only the second column output ({print $2}, with $2 being the second column, $1 the first etc.) by using awk. Finally, we pass this to xargs and write out kill -9 in a clean and easy to understand format. We could have written this using a shorthand syntax, but this is nice, clean and clear. The -I indicates what we will use as our replace-string (replacing any occurrence of the same within the command with whatever input xargs has received from the pipe), in this case {}. I also recommend {} in general as a safe swap/replace string.

If you would like to learn more about xargs, please see our Xargs for Beginners with Examples and Multi Threaded Xargs with Examples articles.

The result of running the command is that any processes which use more then 10% of memory will be immediately terminated with a strong kill -9 command. If you would like to automate the same, you could put this inside a while true; do ..... done loop, simply replacing the ..... with the command above, or you could add this to your crontab, or other pre-existing monitoring scripts.



Be careful with using these commands, it is not without risk. Endeavor to understand what you are doing at all times! You may also like to introduce a 1 minute sleep to avoid hammering the server with commands:

# while true; do ps -eo pmem,pid --sort -rss | grep '^[ \t]*[1-9][0-9]\.' | awk '{print $2}' | xargs -I{} kill -9 {}; sleep 60; done

This way we are monitoring all processes in memory on a regular and/or ongoing basis and terminating any processes which are starting to go rogue, use to much memory etc.

Conclusion

In this article, we look at managing processes in an automated fashion by using custom formatted ps output, as well as the xargs and kill commands. We also explored what resources to monitor, and why. Finally we demonstrated how automatic process termination can work for memory hogging issues in code. Enjoy!



Comments and Discussions
Linux Forum