How to increase the security of systemd services

Nowadays all major Linux distributions adopted Systemd as their init system/service manager. Creating a systemd service is just a matter of writing a “.service” unit in the appropriate directory, and manage it using the systemctl utility. When starting a service, or launching a process in general, we want to make sure it runs with the lowest possible set of privileges it needs to accomplish the task. Systemd provides a series of options we can be use to fine-tune the behavior of a service, granting or denying privileges in a granular way, and ensuring a certain level of isolation from the rest of the system.

In this article we see how to increase the security of a systemd service, and how to get an estimate of its exposure level using the systemd-analyze utility.

In this tutorial you will learn:

  • How to increase the security of a systemd service
  • How to get the estimated service exposure level using systemd-analyze
How to increase the security of Systemd services
Original image by rawpixel.com on Freepik
Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution agnostic
Software Systemd
Other None
Conventions # – requires given linux-commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux-commands to be executed as a regular non-privileged user

A test case: writing a backup service

In a previous tutorial we talked about Restic, an efficient deduplicating backup program written in Go. For the sake of this article, let’s imagine we want to write a “restic” service to schedule a backup via a systemd-timer. We begin by writing the “Unit” section:

[Unit] 
Description=restic backup
Wants=network-online.target
After=network-online.target

As a first thing, we provided a service description, using the Description option. Since we want to be able to use remote repositories for our backups, by using the Wants and After options, we respectively declared a (weak) dependency of the service on the network-online.target, and established it must be started only after said target is reached and network interfaces have been configured.

Now, let’s populate the “Service” section of the unit. This is where we define our service behavior:

[Service]
Type=oneshot
User=restic
ExecStart=/usr/local/bin/restic_backup.sh



We used the “Type” option to define our service as “oneshot”. This influences how systemd treats the service: it will consider it “up” only after its the main process exits.

For obvious security reasons, we want to avoid running the service as root, therefore we created the “restic” unprivileged user, and with the User option, we specified the process should be launched with its privileges. Finally, with the ExecStart option, we defined the command/executable which should be invoked when the service is started; in this case, it is the /usr/local/bin/restic_backup.sh script, which contains the main backup logic.

In the “Service” section of the unit, we can use several other options to further tune the privileges of our service. Let’s see some of them.

Running the process with capabilities

Our service will run with the privileges of the “restic” user. This is a good security measure, however, we must ensure restic is able to read the entire filesystem. In order to reach our goal, we can ensure the process runs with the appropriate capability:

AmbientCapabilities=CAP_DAC_READ_SEARCH
CapabilityBoundingSet=CAP_DAC_READ_SEARCH



TheAmbientCapabilities option takes the comma-separated list of capabilities we want to include in the ambient capability set of the process as value. In this case we just used the “CAP_DAC_READ_SEARCH” capability, which allows bypassing files and directories read permission checks. As a security measure, we also used the CapabilityBoundingSet option to limit the set of capabilities the process is allowed to obtain.

Hardening the service

The two capabilities-related options we used above, are just a small subset of the ones we can use to “isolate” the service. Most of them accept a boolean value. Let’s see some examples.

PrivateTmp

The “PrivateTmp” option protects temporary files created by the service, so that other processes cannot access them. When the option is active, systemd creates isolated /tmp and /var/tmp directories and mounts them in a private namespace.

ProtectKernelModules, ProtectKernelLogs and ProtectKernelTunables

These options protect the kernel state. The ProtectKernelModules one, when active, denies the service the ability to load and unload kernel modules, while ProtectKernelLogs denies access to the kernel log buffer. The behavior of certain kernel modules can be altered by writing appropriate values to files exposed in the /proc and /sys pseudo-filesystems; the ProtectKernelTunables option, when active, denies such actions.

NoNewPrivileges

This option can be used to make sure the service, and its child processes, cannot gain new privileges by executing other programs via the execve system call, which is part of the standard C library. When this option is active, it denies the execution of binaries with the SETUID or SETGID bits set.

RestrictSUIDSGID

This option, when true, prevents the process invoked by the service to set the SETUID or SETGID bits on files and directories.

RestrictAddressFamilies

This option accepts the space-separated list of address family names the process can access (e.g: AF_UNIX, AF_INET, AF_INET6);  “none” is also a valid value: it denies access to all of them.

PrivateDevices

When this option is active, it denies the process access to raw devices such as /dev/sda or /dev/mem.

ProtectClock

When set to true, it denies access to the system clock.

ProtectHostname

This option protects the system hostname, ensuring the process cannot change it.

RemoveIPC

When active, it causes the automatic removal of IPC (Inter Process Communication) resources allocated to the service.

PrivateMounts

When this option is active, the process runs in its own private and isolated filesystem, inacessible from the host.

Obtaining an estimate security level of the service

Here is how our service looks like, in the end:

[Unit] 
Description=restic backup
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
User=restic
ExecStart=/usr/local/bin/restic_backup.sh
AmbientCapabilities=CAP_DAC_READ_SEARCH
CapabilityBoundingSet=CAP_DAC_READ_SEARCH
PrivateTmp=yes
ProtectKernelModules=yes
ProtectKernelLogs=yes
ProtectKernelTunables=yes
NoNewPrivileges=yes
RestrictSUIDSGID=yes
RestrictAddressFamilies=yes
PrivateDevices=yes
ProtectClock=yes
ProtectHostname=yes
RemoveIPC=yes
PrivateMounts=yes

Once we place the unit in one of the directories recognized by systemd (/etc/systemd/system, for example), to get its estimated security level, we just need to launch “systemd-analyze” with the “security” command, passing the unit name as argument. Supposing we saved the unit as “restic.service”, we would run:

$ systemd-analyze security restic.service



The command returns a list of the available security options, marking those present and those absent in the unit, and an overall exposure level: the lower this is, the better. Each unused option increases the exposure value by the amount reported in the “EXPOSURE” column. Here is the output we obtain by running the command against our service unit:

  NAME                                                        DESCRIPTION                                                        EXPOSURE
 RemoveIPC=                                                  Service user cannot leave SysV IPC objects around                          
 RootDirectory=/RootImage=                                   Service runs within the host's root directory                           0.1
 User=/DynamicUser=                                          Service runs under a static non-root user identity                         
 CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes cannot change the system clock                           
 NoNewPrivileges=                                            Service processes cannot acquire new privileges                            
 AmbientCapabilities=                                        Service process receives ambient capabilities                           0.1
 CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service may override UNIX file/IPC permission checks                    0.2
 ProtectControlGroups=                                       Service may modify the control group file system                        0.2
 CapabilityBoundingSet=~CAP_BPF                              Service may load BPF programs                                              
 SystemCallArchitectures=                                    Service may execute system calls with all ABIs                          0.2
 MemoryDenyWriteExecute=                                     Service may create writable executable memory mappings                  0.1
 RestrictNamespaces=~user                                    Service may create user namespaces                                      0.3
 RestrictNamespaces=~pid                                     Service may create process namespaces                                   0.1
 RestrictNamespaces=~net                                     Service may create network namespaces                                   0.1
 RestrictNamespaces=~uts                                     Service may create hostname namespaces                                  0.1
 RestrictNamespaces=~mnt                                     Service may create file system namespaces                               0.1
 RestrictNamespaces=~cgroup                                  Service may create cgroup namespaces                                    0.1
 RestrictNamespaces=~ipc                                     Service may create IPC namespaces                                       0.1
 LockPersonality=                                            Service may change ABI personality                                      0.1
 RestrictRealtime=                                           Service may acquire realtime scheduling                                 0.1
 SupplementaryGroups=                                        Service has no supplementary groups                                        
 CapabilityBoundingSet=~CAP_SYS_RAWIO                        Service has no raw I/O access                                              
 CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has no ptrace() debugging abilities                                
 CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has no privileges to change resource use parameters                
 CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has no network configuration privileges                            
 CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no elevated networking privileges                              
 CapabilityBoundingSet=~CAP_AUDIT_*                          Service has no audit subsystem access                                      
 CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has no administrator privileges                                    
 PrivateTmp=                                                 Service has no access to other software's temporary files                  
 CapabilityBoundingSet=~CAP_SYSLOG                           Service has no access to kernel logging                                    
 PrivateDevices=                                             Service has no access to hardware devices                                  
 ProtectSystem=                                              Service has full access to the OS file hierarchy                        0.2
 ProtectProc=                                                Service has full access to process tree (/proc hidepid=)                0.2
 ProcSubset=                                                 Service has full access to non-process /proc files (/proc subset=)      0.1
 ProtectHome=                                                Service has full access to home directories                             0.2
 PrivateNetwork=                                             Service has access to the host's network                                0.5
 PrivateUsers=                                               Service has access to other users                                       0.2
 DeviceAllow=                                                Service has a device ACL with some special devices: char-rtc:r          0.1
 KeyringMode=                                                Service doesn't share key material with other services                     
 Delegate=                                                   Service does not maintain its own delegated control group subtree          
 SystemCallFilter=~@clock                                    Service does not filter system calls                                    0.2
 SystemCallFilter=~@cpu-emulation                            Service does not filter system calls                                    0.1
 SystemCallFilter=~@debug                                    Service does not filter system calls                                    0.2
 SystemCallFilter=~@module                                   Service does not filter system calls                                    0.2
 SystemCallFilter=~@mount                                    Service does not filter system calls                                    0.2
 SystemCallFilter=~@obsolete                                 Service does not filter system calls                                    0.1
 SystemCallFilter=~@privileged                               Service does not filter system calls                                    0.2
 SystemCallFilter=~@raw-io                                   Service does not filter system calls                                    0.2
 SystemCallFilter=~@reboot                                   Service does not filter system calls                                    0.2
 SystemCallFilter=~@resources                                Service does not filter system calls                                    0.2
 SystemCallFilter=~@swap                                     Service does not filter system calls                                    0.2
 IPAddressDeny=                                              Service does not define an IP address allow list                        0.2
 NotifyAccess=                                               Service child processes cannot alter service state                         
 ProtectClock=                                               Service cannot write to the hardware clock or system clock                 
 CapabilityBoundingSet=~CAP_SYS_PACCT                        Service cannot use acct()                                                  
 CapabilityBoundingSet=~CAP_KILL                             Service cannot send UNIX signals to arbitrary processes                    
 ProtectKernelLogs=                                          Service cannot read from or write to the kernel log ring buffer            
 CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service cannot program timers that wake up the system                      
 CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service cannot mark files immutable                                        
 CapabilityBoundingSet=~CAP_IPC_LOCK                         Service cannot lock memory into RAM                                        
 ProtectKernelModules=                                       Service cannot load or read kernel modules                                 
 CapabilityBoundingSet=~CAP_SYS_MODULE                       Service cannot load kernel modules                                         
 CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service cannot issue vhangup()                                             
 CapabilityBoundingSet=~CAP_SYS_BOOT                         Service cannot issue reboot()                                              
 CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service cannot issue chroot()                                              
 PrivateMounts=                                              Service cannot install system mounts                                       
 CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service cannot establish wake locks                                        
 CapabilityBoundingSet=~CAP_LEASE                            Service cannot create file leases                                          
 CapabilityBoundingSet=~CAP_MKNOD                            Service cannot create device nodes                                         
 ProtectHostname=                                            Service cannot change system host/domainname                               
 CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service cannot change file ownership/access mode/capabilities              
 CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service cannot change UID/GID identities/capabilities                      
 ProtectKernelTunables=                                      Service cannot alter kernel tunables (/proc/sys, …)                        
 RestrictAddressFamilies=~AF_PACKET                          Service cannot allocate packet sockets                                     
 RestrictAddressFamilies=~AF_NETLINK                         Service cannot allocate netlink sockets                                    
 RestrictAddressFamilies=~AF_UNIX                            Service cannot allocate local sockets                                      
 RestrictAddressFamilies=~…                                  Service cannot allocate exotic sockets                                     
 RestrictAddressFamilies=~AF_(INET|INET6)                    Service cannot allocate Internet sockets                                   
 CapabilityBoundingSet=~CAP_MAC_*                            Service cannot adjust SMACK MAC                                            
 RestrictSUIDSGID=                                           SUID/SGID file creation by service is restricted                           
 UMask=                                                      Files created by service are world-readable by default                  0.1

→ Overall exposure level for restic.service: 4.6 OK 🙂

Our score is “4.6”: not bad, but we could still improve it!

Conclusions

In this tutorial we learned how to increase the security level of a service by using some Systemd options. We also learned how to use the systemd-analyze utility to test the service exposure level. For obvious reasons, we didn’t discussed all available hardening options here; to get the complete list, and learn more about this topic, please take a look at the online docs or just check the “systemd.exec” manpage.



Comments and Discussions
Linux Forum