Nowadays all major Linux distributions adopted Systemd as their init system/service manager. Creating a systemd service is just a matter of writing a “.service” unit in the appropriate directory, and manage it using the systemctl utility. When starting a service, or launching a process in general, we want to make sure it runs with the lowest possible set of privileges it needs to accomplish the task. Systemd provides a series of options we can be use to fine-tune the behavior of a service, granting or denying privileges in a granular way, and ensuring a certain level of isolation from the rest of the system.
In this article we see how to increase the security of a systemd service, and how to get an estimate of its exposure level using the systemd-analyze utility.
In this tutorial you will learn:
- How to increase the security of a systemd service
- How to get the estimated service exposure level using systemd-analyze
|Requirements, Conventions or Software Version Used
|# – requires given linux-commands to be executed with root privileges either directly as a root user or by use of
$ – requires given linux-commands to be executed as a regular non-privileged user
A test case: writing a backup service
In a previous tutorial we talked about Restic, an efficient deduplicating backup program written in Go. For the sake of this article, let’s imagine we want to write a “restic” service to schedule a backup via a systemd-timer. We begin by writing the “Unit” section:
[Unit] Description=restic backup Wants=network-online.target After=network-online.target
As a first thing, we provided a service description, using the
Description option. Since we want to be able to use remote repositories for our backups, by using the
After options, we respectively declared a (weak) dependency of the service on the
network-online.target, and established it must be started only after said target is reached and network interfaces have been configured.
Now, let’s populate the “Service” section of the unit. This is where we define our service behavior:
[Service] Type=oneshot User=restic ExecStart=/usr/local/bin/restic_backup.sh
We used the “Type” option to define our service as “oneshot”. This influences how systemd treats the service: it will consider it “up” only after its the main process exits.
For obvious security reasons, we want to avoid running the service as root, therefore we created the “restic” unprivileged user, and with the
User option, we specified the process should be launched with its privileges. Finally, with the
ExecStart option, we defined the command/executable which should be invoked when the service is started; in this case, it is the /usr/local/bin/restic_backup.sh script, which contains the main backup logic.
In the “Service” section of the unit, we can use several other options to further tune the privileges of our service. Let’s see some of them.
Running the process with capabilities
Our service will run with the privileges of the “restic” user. This is a good security measure, however, we must ensure restic is able to read the entire filesystem. In order to reach our goal, we can ensure the process runs with the appropriate capability:
AmbientCapabilities option takes the comma-separated list of capabilities we want to include in the ambient capability set of the process as value. In this case we just used the “CAP_DAC_READ_SEARCH” capability, which allows bypassing files and directories read permission checks. As a security measure, we also used the
CapabilityBoundingSet option to limit the set of capabilities the process is allowed to obtain.
Hardening the service
The two capabilities-related options we used above, are just a small subset of the ones we can use to “isolate” the service. Most of them accept a boolean value. Let’s see some examples.
The “PrivateTmp” option protects temporary files created by the service, so that other processes cannot access them. When the option is active, systemd creates isolated /tmp and /var/tmp directories and mounts them in a private namespace.
ProtectKernelModules, ProtectKernelLogs and ProtectKernelTunables
These options protect the kernel state. The
ProtectKernelModules one, when active, denies the service the ability to load and unload kernel modules, while
ProtectKernelLogs denies access to the kernel log buffer. The behavior of certain kernel modules can be altered by writing appropriate values to files exposed in the /proc and /sys pseudo-filesystems; the
ProtectKernelTunables option, when active, denies such actions.
This option can be used to make sure the service, and its child processes, cannot gain new privileges by executing other programs via the execve system call, which is part of the standard C library. When this option is active, it denies the execution of binaries with the SETUID or SETGID bits set.
This option, when true, prevents the process invoked by the service to set the SETUID or SETGID bits on files and directories.
This option accepts the space-separated list of address family names the process can access (e.g: AF_UNIX, AF_INET, AF_INET6); “none” is also a valid value: it denies access to all of them.
When this option is active, it denies the process access to raw devices such as /dev/sda or /dev/mem.
When set to true, it denies access to the system clock.
This option protects the system hostname, ensuring the process cannot change it.
When active, it causes the automatic removal of IPC (Inter Process Communication) resources allocated to the service.
When this option is active, the process runs in its own private and isolated filesystem, inacessible from the host.
Obtaining an estimate security level of the service
Here is how our service looks like, in the end:
[Unit] Description=restic backup Wants=network-online.target After=network-online.target [Service] Type=oneshot User=restic ExecStart=/usr/local/bin/restic_backup.sh AmbientCapabilities=CAP_DAC_READ_SEARCH CapabilityBoundingSet=CAP_DAC_READ_SEARCH PrivateTmp=yes ProtectKernelModules=yes ProtectKernelLogs=yes ProtectKernelTunables=yes NoNewPrivileges=yes RestrictSUIDSGID=yes RestrictAddressFamilies=yes PrivateDevices=yes ProtectClock=yes ProtectHostname=yes RemoveIPC=yes PrivateMounts=yes
Once we place the unit in one of the directories recognized by systemd (/etc/systemd/system, for example), to get its estimated security level, we just need to launch “systemd-analyze” with the “security” command, passing the unit name as argument. Supposing we saved the unit as “restic.service”, we would run:
$ systemd-analyze security restic.service
The command returns a list of the available security options, marking those present and those absent in the unit, and an overall exposure level: the lower this is, the better. Each unused option increases the exposure value by the amount reported in the “EXPOSURE” column. Here is the output we obtain by running the command against our service unit:
NAME DESCRIPTION EXPOSURE ✓ RemoveIPC= Service user cannot leave SysV IPC objects around ✗ RootDirectory=/RootImage= Service runs within the host's root directory 0.1 ✓ User=/DynamicUser= Service runs under a static non-root user identity ✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes cannot change the system clock ✓ NoNewPrivileges= Service processes cannot acquire new privileges ✗ AmbientCapabilities= Service process receives ambient capabilities 0.1 ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks 0.2 ✗ ProtectControlGroups= Service may modify the control group file system 0.2 ✓ CapabilityBoundingSet=~CAP_BPF Service may load BPF programs ✗ SystemCallArchitectures= Service may execute system calls with all ABIs 0.2 ✗ MemoryDenyWriteExecute= Service may create writable executable memory mappings 0.1 ✗ RestrictNamespaces=~user Service may create user namespaces 0.3 ✗ RestrictNamespaces=~pid Service may create process namespaces 0.1 ✗ RestrictNamespaces=~net Service may create network namespaces 0.1 ✗ RestrictNamespaces=~uts Service may create hostname namespaces 0.1 ✗ RestrictNamespaces=~mnt Service may create file system namespaces 0.1 ✗ RestrictNamespaces=~cgroup Service may create cgroup namespaces 0.1 ✗ RestrictNamespaces=~ipc Service may create IPC namespaces 0.1 ✗ LockPersonality= Service may change ABI personality 0.1 ✗ RestrictRealtime= Service may acquire realtime scheduling 0.1 ✓ SupplementaryGroups= Service has no supplementary groups ✓ CapabilityBoundingSet=~CAP_SYS_RAWIO Service has no raw I/O access ✓ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has no ptrace() debugging abilities ✓ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE) Service has no privileges to change resource use parameters ✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no network configuration privileges ✓ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no elevated networking privileges ✓ CapabilityBoundingSet=~CAP_AUDIT_* Service has no audit subsystem access ✓ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has no administrator privileges ✓ PrivateTmp= Service has no access to other software's temporary files ✓ CapabilityBoundingSet=~CAP_SYSLOG Service has no access to kernel logging ✓ PrivateDevices= Service has no access to hardware devices ✗ ProtectSystem= Service has full access to the OS file hierarchy 0.2 ✗ ProtectProc= Service has full access to process tree (/proc hidepid=) 0.2 ✗ ProcSubset= Service has full access to non-process /proc files (/proc subset=) 0.1 ✗ ProtectHome= Service has full access to home directories 0.2 ✗ PrivateNetwork= Service has access to the host's network 0.5 ✗ PrivateUsers= Service has access to other users 0.2 ✗ DeviceAllow= Service has a device ACL with some special devices: char-rtc:r 0.1 ✓ KeyringMode= Service doesn't share key material with other services ✓ Delegate= Service does not maintain its own delegated control group subtree ✗ SystemCallFilter=~@clock Service does not filter system calls 0.2 ✗ SystemCallFilter=~@cpu-emulation Service does not filter system calls 0.1 ✗ SystemCallFilter=~@debug Service does not filter system calls 0.2 ✗ SystemCallFilter=~@module Service does not filter system calls 0.2 ✗ SystemCallFilter=~@mount Service does not filter system calls 0.2 ✗ SystemCallFilter=~@obsolete Service does not filter system calls 0.1 ✗ SystemCallFilter=~@privileged Service does not filter system calls 0.2 ✗ SystemCallFilter=~@raw-io Service does not filter system calls 0.2 ✗ SystemCallFilter=~@reboot Service does not filter system calls 0.2 ✗ SystemCallFilter=~@resources Service does not filter system calls 0.2 ✗ SystemCallFilter=~@swap Service does not filter system calls 0.2 ✗ IPAddressDeny= Service does not define an IP address allow list 0.2 ✓ NotifyAccess= Service child processes cannot alter service state ✓ ProtectClock= Service cannot write to the hardware clock or system clock ✓ CapabilityBoundingSet=~CAP_SYS_PACCT Service cannot use acct() ✓ CapabilityBoundingSet=~CAP_KILL Service cannot send UNIX signals to arbitrary processes ✓ ProtectKernelLogs= Service cannot read from or write to the kernel log ring buffer ✓ CapabilityBoundingSet=~CAP_WAKE_ALARM Service cannot program timers that wake up the system ✓ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE Service cannot mark files immutable ✓ CapabilityBoundingSet=~CAP_IPC_LOCK Service cannot lock memory into RAM ✓ ProtectKernelModules= Service cannot load or read kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot load kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG Service cannot issue vhangup() ✓ CapabilityBoundingSet=~CAP_SYS_BOOT Service cannot issue reboot() ✓ CapabilityBoundingSet=~CAP_SYS_CHROOT Service cannot issue chroot() ✓ PrivateMounts= Service cannot install system mounts ✓ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND Service cannot establish wake locks ✓ CapabilityBoundingSet=~CAP_LEASE Service cannot create file leases ✓ CapabilityBoundingSet=~CAP_MKNOD Service cannot create device nodes ✓ ProtectHostname= Service cannot change system host/domainname ✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service cannot change file ownership/access mode/capabilities ✓ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service cannot change UID/GID identities/capabilities ✓ ProtectKernelTunables= Service cannot alter kernel tunables (/proc/sys, …) ✓ RestrictAddressFamilies=~AF_PACKET Service cannot allocate packet sockets ✓ RestrictAddressFamilies=~AF_NETLINK Service cannot allocate netlink sockets ✓ RestrictAddressFamilies=~AF_UNIX Service cannot allocate local sockets ✓ RestrictAddressFamilies=~… Service cannot allocate exotic sockets ✓ RestrictAddressFamilies=~AF_(INET|INET6) Service cannot allocate Internet sockets ✓ CapabilityBoundingSet=~CAP_MAC_* Service cannot adjust SMACK MAC ✓ RestrictSUIDSGID= SUID/SGID file creation by service is restricted ✗ UMask= Files created by service are world-readable by default 0.1 → Overall exposure level for restic.service: 4.6 OK 🙂
Our score is “4.6”: not bad, but we could still improve it!
In this tutorial we learned how to increase the security level of a service by using some Systemd options. We also learned how to use the systemd-analyze utility to test the service exposure level. For obvious reasons, we didn’t discussed all available hardening options here; to get the complete list, and learn more about this topic, please take a look at the online docs or just check the “systemd.exec” manpage.