Linux Apache log analyzer

For every visit to your Apache web server, a line is generated inside of the log file. As you can imagine, the log files can fill up very quickly with visitor information and errors that are encountered, so they become unwieldy and tough to sift through. Analyzing the logs is an important part of administering Apache and ensuring that it runs as expected.

In this tutorial, we will see how to locate the Apache log files on a Linux system and look through them for relevant information about visitors, errors, and Apache performance. In the process, you will learn how to interpret Apache log files. We will also go over some tools that can be installed which will look through the log files and compile relevant statistics that are easier for a person to go through.

In this tutorial you will learn:

  • Where are Apache log files stored
  • How are Apache log files rotated by the system
  • How to interpret the data in access.log and error.log
  • How to install and use other tools to interpret Apache log files
Linux Apache log analyzer
Linux Apache log analyzer
Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Any Linux distro
Software Apache web server, GoAccess
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

Apache log files and their location




Apache produces two different log files:

  • access.log stores information about all the incoming connection requests to Apache. Every time a user visits your website, it will be logged here. Each page a user requests will also be logged as a separate entry.
  • error.log stores information about errors that Apache encounters throughout its operation. Ideally, this file should remain relatively empty.
Apache default Log configuration on Ubuntu Linux server
Apache default Log configuration on Ubuntu Linux server

The location of the log files may depend on which version of Apache you are running and what Linux distribution it’s on. Apache can also be configured to store these files in some other non-default location.

But, by default, you should be able to find the access and error logs in one of these directories:

  • /var/log/apache/
  • /var/log/apache2/
  • /etc/httpd/logs/

Interpreting Apache logs

The usual format that Apache follows for presenting log entries is:

"%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""

Here’s how to interpret this formatting:

  • %h – The IP address of the client.
  • %l – This is the ‘identd’ on the client, which is used to identify them. This field is usually empty, and presented as a hyphen.
  • %u – The user ID of the client, if HTTP authentication was used. If not, the log entry won’t show anything for this field.
  • %t – Timestamp of the log entry.
  • \%r\ – The request line from the client. This will show what HTTP method was used (such as GET or POST), what file was requested, and what HTTP protocol was used.
  • %>s – The status code that was returned to the client. Codes of 4xx (such as 404, page not found) indicate client errors and codes of 5xx (such as 500, internal server error) indicate server errors. Other numbers should indicate success (such as 200, OK) or something else like redirection (such as 301, permanently moved).
  • %O – The size of the file (including headers), in bytes, that was requested.
  • \"%{Referer}i\" – The referring link, if applicable. This tells you how the user navigated to your page (either from an internal or external link).
  • \"%{User-Agent}i\" – This contains information about the connecting client’s web browser and operating system.

Now that you know how to interpet a typical entry in the Apache log files, let’s take a look at some entries (adjust the following path to match where your Apache log files are stored):

$ less /var/log/apache2/access.log
OR
$ less /var/log/apache2/error.log

A typical entry in the access log will look something like this:

10.10.220.3 - - [17/Dec/2019:23:05:32 -0500] "GET /products/index.php HTTP/1.1" 200 5015 "http://example.com/products/index.php" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36"

The error log is a bit more straightforward and easy to interpret. Here’s what a typical entry may look like:



[Mon Dec 16 06:29:16.613789 2019] [php7:error] [pid 2095] [client 10.10.244.61:24145] script '/var/www/html/settings.php' not found or unable to stat

This is a good way to see how many 404 errors your visitors are encountering, and may clue you in to some dead links on your site. More importantly, it can alert you to missing resources or potential server problems. The example above shows a .php page that was requested but missing.

DID YOU KNOW?
You can customize the formatting of Apache logs, and even configure exactly what information Apache should log. See our tutorial on How to analyze and interpret Apache Webserver Log to learn about Apache log modules and how to customize your Apache logging.

Side note: you may see a lot of compressed log files in your /var/log/apache2 or /var/log/httpd directory:

$ ls /var/log/apache2
Observe all of the old Apache log files which get compressed and rotated for storage purposes
Observe all of the old Apache log files which get compressed and rotated for storage purposes

Your Apache log files are automatically compressed and rotated to avoid consuming disk space or collecting millions of lines in a single text file. This is facilitated by the logrotate command and system service, and the functionality can be adjusted if you want.

Interpret Apache log files with a tool

You saw in the previous section how to interpret log files, but this task is impossible to do with any kind of efficiency. It takes too long to read each log entry, and a popular website could generate thousands of lines in the log files every single day. Rather than manually trying to process each one, there are third party tools that can analyze the logs for us and present us with the most pertinent information from them.

One that we will look at in this article is GoAccess. You can use the appropriate command below to install GoAccess with your system’s package manager.

To install GoAccess on Ubuntu, Debian, and Linux Mint:

$ sudo apt install goaccess

To install GoAccess on Fedora, CentOS, AlmaLinux, and Red Hat:

$ sudo dnf install goaccess

To install GoAccess on Arch Linux and Manjaro:

$ sudo pacman -S goaccess

After installation, we can use GoAccess to generate us a report from a log file. For example, to create a report from Apache’s access.log:

$ sudo goaccess /var/log/apache2/access.log --log-format=COMBINED -a -o /var/www/html/report.html

This command will generate report.html and place it inside of your /var/www/html directory, which is the default directory where Apache hosts websites. There is a good chance that your server has been configured to host websites in a different directory, so you will need to adjust this command accordingly. After running the command, you can pull up report.html in your web browser.

Viewing the GoAccess Apache report
Viewing the GoAccess Apache report



Closing Thoughts

In this tutorial, we saw how to analyze and interpret Apache web server logs on a Linux system. We learned how to manually open the access.log and error.log files, and make sense of the data that is shown inside. Then, we saw how to use a tool such as GoAccess that can automatically sift through the files and compile the relevant data in a more human readable and digestable way. Part of web server administration is to stay on top of your Apache logs and continuously monitor them for anomalies and errors. It is your choice whether you want to use a tool to help with the job, do it manually, or configure a custom solution yourself.



Comments and Discussions
Linux Forum