How to monitor file integrity on Linux using Osquery

The basic concept involved with the use of the osquery application is the “tabular abstraction” of many aspects of the operating system, such as processes, users, etc. The data is stored in tables which can be queried using SQL syntax, directly via the osqueryi shell, or via the osqueryd daemon.

In this tutorial we will see how to install the application, how to run basic queries, and how to use FIM (File Integrity Monitoring) as part of your Linux system administration job.

In this tutorial you will learn:

How to install osquery
How to list the available tables
How to perform queries from the osqueryi shell
How to use the osqueryd daemon to monitor file integrity

How to monitor file integrity on Linux using Osquery

Software Requirements and Conventions Used

Basic knowledge of SQL concepts
Root permissions to perform administrative tasks

Software Requirements and Linux Command Line Conventions
Category	Requirements, Conventions or Software Version Used
System	Distribution-independent
Software	Osquery
Other
Conventions	# – requires given linux commands to be executed with root privileges either directly as a root user or by use of `sudo` command $ – requires given linux commands to be executed as a regular non-privileged user

Installation

We have basically two option to install osquery: the first consists into downloading the appropriate package for our system from the official website; the second, usually preferred, is to add the osquery repository to our distribution software sources. Here we will briefly explore both options.

Installing via package

From the official osquery website is possible to download signed deb and rpm packages, or more generic tarballs. As a first thing we select the version we want to install, then we download a package.

The advice is to select the latest available version (4.1.2 at the moment of writing). Once the package is downloaded we can install it using our distribution package manager. For example, to install the software on a Fedora system (assuming the package is located in our current working directory), we would run:

$ sudo dnf install ./osquery-4.1.2-1.linux.x86_64.rpm

Using a repository

As an alternative we can add the rpm or deb repository to our distribution. If we are using a rpm-based distribution, we can run the following commands to accomplish the task:

$ curl -L https://pkg.osquery.io/rpm/GPG | sudo tee
/etc/pki/rpm-gpg/RPM-GPG-KEY-osquery
$ sudo yum-config-manager --add-repo https://pkg.osquery.io/rpm/osquery-s3-rpm.repo
$ sudo yum-config-manager --enable osquery-s3-rpm-repo
$ sudo yum install osquery

With the linux commands above, we add the gpg pulic key used to sign the packages to our system, then we add the repository. Finally, we install the osquery package. Notice that yum, in recent versions of Fedora and CentOS/RHEL is just a symbolic link to dnf, so when we invoke the former the latter is used instead.

If we are running a Debian-based distribution, instead, we can add the deb repository to our software sources by running:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys
1484120AC4E9F8A1A577AEEE97A80C63C9D8B80B
$ sudo add-apt-repository 'deb [arch=amd64] https://pkg.osquery.io/deb deb main'
$ sudo apt-get update
$ sudo apt-get install osquery

Once we the package is installed, we can take a look at the software basic usage.

Basic usage

Osquery allow us to monitor various aspects of an operating system adopting a “tabular abstraction”, using a SQL syntax similar to the one used on sqlite databases. The queries are run on tables which abstracts various operating system aspects, such as processes and services.

We can run the queries directly using the osqueryi interactive shell, or we can schedule them via the osqueryd daemon. Here is an example of a query to list all the available tables (the complete list with the tables description can also be found
online):

$ osqueryi
osquery> .tables
  => acpi_tables
  => apt_sources
  => arp_cache
  => atom_packages
  => augeas
  => authorized_keys
  => block_devices
  => carbon_black_info
  => carves
  => chrome_extensions
  => cpu_time
  => cpuid
  => crontab
  => curl
  => curl_certificate
  => deb_packages
  => device_file
  => device_hash
  => device_partitions
  => disk_encryption
  => dns_resolvers
  => docker_container_labels
  => docker_container_mounts
  => docker_container_networks
  => docker_container_ports
  => docker_container_processes
  => docker_container_stats
  => docker_containers
  => docker_image_labels
  => docker_images
  => docker_info
  => docker_network_labels
  => docker_networks
  => docker_version
  => docker_volume_labels
  => docker_volumes
  => ec2_instance_metadata
  => ec2_instance_tags
  => elf_dynamic
  => elf_info
  => elf_sections
  => elf_segments
  => elf_symbols
  => etc_hosts
  => etc_protocols
  => etc_services
  => file
  => file_events
  => firefox_addons
  => groups
  => hardware_events
  => hash
  => intel_me_info
  => interface_addresses
  => interface_details
  => interface_ipv6
  => iptables
  => kernel_info
  => kernel_integrity
  => kernel_modules
  => known_hosts
  => last
  => listening_ports
  => lldp_neighbors
  => load_average
  => logged_in_users
  => magic
  => md_devices
  => md_drives
  => md_personalities
  => memory_array_mapped_addresses
  => memory_arrays
  => memory_device_mapped_addresses
  => memory_devices
  => memory_error_info
  => memory_info
  => memory_map
  => mounts
  => msr
  => npm_packages
  => oem_strings
  => opera_extensions
  => os_version
  => osquery_events
  => osquery_extensions
  => osquery_flags
  => osquery_info
  => osquery_packs
  => osquery_registry
  => osquery_schedule
  => pci_devices
  => platform_info
  => portage_keywords
  => portage_packages
  => portage_use
  => process_envs
  => process_events
  => process_file_events
  => process_memory_map
  => process_namespaces
  => process_open_files
  => process_open_sockets
  => processes
  => prometheus_metrics
  => python_packages
  => routes
  => rpm_package_files
  => rpm_packages
  => selinux_events
  => shadow
  => shared_memory
  => shell_history
  => smart_drive_info
  => smbios_tables
  => socket_events
  => ssh_configs
  => sudoers
  => suid_bin
  => syslog_events
  => system_controls
  => system_info
  => time
  => ulimit_info
  => uptime
  => usb_devices
  => user_events
  => user_groups
  => user_ssh_keys
  => users
  => yara
  => yara_events
  => yum_sources

Running the osqueryi command we enter the interactive shell; from it, we can issue our queries and instructions. Here is another example of a query, this time to list all the running processes pid and name. The query is performed on the process table (the output of the query has been truncated for convenience):

osquery> SELECT pid, name FROM processes;
+-------+------------------------------------+
| pid   | name                               |
+-------+------------------------------------+
| 1     | systemd                            |
| 10    | rcu_sched                          |
| 10333 | kworker/u16:5-events_unbound       |
| 10336 | kworker/2:0-events                 |
| 11    | migration/0                        |
| 11002 | kworker/u16:1-kcryptd/253:0        |
| 11165 | kworker/1:1-events                 |
| 11200 | kworker/1:3-events                 |
| 11227 | bash                               |
| 11368 | osqueryi                           |
| 11381 | kworker/0:0-events                 |
| 11395 | Web Content                        |
| 11437 | kworker/0:2-events                 |
| 11461 | kworker/3:2-events_power_efficient |
| 11508 | kworker/2:2                        |
| 11509 | kworker/0:1-events                 |
| 11510 | kworker/u16:2-kcryptd/253:0        |
| 11530 | bash                               |
[...]                                        |
+-------+------------------------------------+

It’s even possible to perform queries on joined tables using the JOIN statement, just like we use to do in relational databases. In the example below we perform a query on the processes table, joined with the users one via the uid column:

osquery> SELECT processes.pid, processes.name, users.username FROM processes JOIN
users ON processes.uid = users.uid;
+-------+-------------------------------+------------------+
| pid   | name                          | username         |
+-------+-------------------------------+------------------+
| 1     | systemd                       | root             |
| 10    | rcu_sched                     | root             |
| 11    | migration/0                   | root             |
| 11227 | bash                          | egdoc            |
| 11368 | osqueryi                      | egdoc            |
| 13    | cpuhp/0                       | root             |
| 14    | cpuhp/1                       | root             |
| 143   | kintegrityd                   | root             |
| 144   | kblockd                       | root             |
| 145   | blkcg_punt_bio                | root             |
| 146   | tpm_dev_wq                    | root             |
| 147   | ata_sff                       | root             |
[...]
| 9130  | Web Content                   | egdoc            |
| 9298  | Web Content                   | egdoc            |
| 9463  | gvfsd-metadata                | egdoc            |
| 9497  | gvfsd-network                 | egdoc            |
| 9518  | gvfsd-dnssd                   | egdoc            |
+-------+-------------------------------+------------------+

File Integrity Monitoring (FIM)

Until now we used osquery via the interactive shell: osqueryi. To use FIM (File Integrity Monitoring), we want to use the osqueryd daemon instead. Via the configuration file, we provide a list of the files we want to monitor. Events such as attribute changes involving the specified files and directories, are recorded in the file_events table. The daemon runs a query on this table after a specified interval of time and notify in the logs when new records are found. Let’s see a configuration example.

Configuration setup

The main configuration file for osquery is /etc/osquery/osquery.conf. The file doesn’t exists by default, so we must create it. The configuration is provided in Json format. Suppose we want to monitor all the files and directories under /etc; here is how we would configure the application:

{
  "options": {
    "disable_events": "false"
  },
  "schedule": {
    "file_events": {
      "query": "SELECT * FROM file_events;",
      "interval": 300
    }
  },
  "file_paths": {
    "etc": [
      "/etc/%%"
    ],
  },
}

Let’s analyze the configuration above. First of all, in the options section, we set disable_events to "false", in order to enable file events.

After that, we created the schedule section: inside this section we can describe and create various named scheduled queries. In our case we created a query which selects all columns from the file_events table, which is meant to be executed every 300 seconds (5 minutes).

After scheduling the query, we created the file_paths section, where we specified the files to be monitored. In this section, each key represents the name of a set of files to be monitored (a category in the osquery jargon). In this case the “etc” key references a list with only one entry, /etc/%%.

What the % symbol stands for? When specifying file paths we can use standard (*) or SQL (%) wildcards. If a single wildcard is provided, it selects all files and directories existing at the specified level. If a double wildcard is provided it selects all files and folders recursively. For example, the /etc/% expression matches all files and folders one level under /etc, while /etc/%% matches all files and folders under /etc recursively.

If we need to, we can also exclude specific files from the path we provided, using the exclude_paths section in the configuration file. In the section we can only reference categories defined in the file_paths section, (“etc” in this case). We provide the list of files to be excluded:

 "exclude_paths": {
     "etc": [
        "/etc/aliases"
     ]
  }

Just as an example, we excluded the /etc/aliases file from the list. Here is how our final configuration looks like:

{
  "options": {
    "disable_events": "false"
  },
  "schedule": {
    "file_events": {
      "query": "SELECT * FROM file_events;",
      "interval": 20
    }
  },
  "file_paths": {
    "etc": [
      "/etc/%%"
    ]
  },
 "exclude_paths": {
     "etc": [
        "/etc/aliases"
     ]
  }
}

Starting the daemon

With our configuration in place, we can start the osqueryd daemon:

$ sudo systemctl start osqueryd

To make the daemon start automatically at boot we must run:

$ sudo systemctl enable osqueyd

Once the the daemon is running, we can check our configuration works. Just as an example, we will modify the permissions of the /etc/fstab file, changing them from 644 to 600:

$ sudo chmod 600 /etc/fstab

We can now verify the change to the file has been recorded by reading the /var/log/osquery/osqueryd.results.log file. Here is the last line of the file (beautified):

{
  "name":"file_events",
  "hostIdentifier":"fingolfin",
  "calendarTime":"Mon Dec 30 19:57:31 2019 UTC",
  "unixTime":1577735851,
  "epoch":0,
  "counter":0,
  "logNumericsAsNumbers":false,
  "columns": {
    "action":"ATTRIBUTES_MODIFIED",
    "atime":"1577735683",
    "category":"etc",
    "ctime":"1577735841",
    "gid":"0",
    "hashed":"0",
    "inode":"262147",
    "md5":"",
    "mode":"0600",
    "mtime":"1577371335",
    "sha1":"",
    "sha256":"",
    "size":"742",
    "target_path":"/etc/fstab",
    "time":"1577735841",
    "transaction_id":"0",
    "uid":"0"
  },
  "action":"added"
}

In the log above, we can clearly see that an ATTRIBUTES_MODIFIED action (Line 10) occurred on the target_path “/etc/fstab” (Line 23), which is part of the “etc” category (Line 12). It’s important to notice that if we query the file_events table from the osqueryi shell, we will see no rows, since the osqueryd daemon, and osqueryi don’t communicate.

Conclusions

In this tutorial we saw the basic concepts involved in the use of the osquery application, which abstracts various operating system concepts using tabular data we can query using SQL syntax. We saw how to install the application, how to perform basic queries using the osqueryi shell, and finally how to setup file monitoring using the osqueryd daemon. We just scratched the surface of what the application can do; as always, the advice is to take a look at the project documentation for a more in depth knowledge.