Apache Kafka is a distributed streaming platform. With it's rich API (Application Programming Interface) set, we can connect mostly anything to Kafka as source of data, and on the other end, we can set up a large number of consumers that will receive the steam of records for processing. Kafka is highly scaleable, and stores the streams of data in a reliable and fault-tolerant way. From the connectivity perspective, Kafka can serve as a bridge between many heterogeneous systems, which in turn can rely on it's capabilities to transfer and persist the data provided.

In this tutorial we will install Apache Kafka on a Red Hat Enterprise Linux 8, create the systemd unit files for ease of management, and test the functionality with the shipped command line tools.

In this tutorial you will learn:
  • How to install Apache Kafka
  • How to create systemd services for Kafka and Zookeeper
  • How to test Kafka with command line clients
Consuming messages on Kafka topic from the command line.
Consuming messages on Kafka topic from the command line.

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Red Hat Enterprise Linux 8
Software Apache Kafka 2.11
Other Privileged access to your Linux system as root or via the sudo command.
Conventions # - requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ - requires given linux commands to be executed as a regular non-privileged user

How to install kafka on Redhat 8 step by step instructions



Apache Kafka is written in Java, so all we need is OpenJDK 8 installed to proceed with the installation. Kafka relies on Apache Zookeeper, a distributed coordination service, that is also written in Java, and is shipped with the package we will download. While installing HA (High Availability) services to a single node does kill their purpose, we'll install and run Zookeeper for Kafka's sake.

  1. To download Kafka from the closest mirror, we need to consult the official download site. We can copy the URL of the .tar.gz file from there. We'll use wget, and the URL pasted to download the package to the target machine:
    # wget https://www-eu.apache.org/dist/kafka/2.1.0/kafka_2.11-2.1.0.tgz -O /opt/kafka_2.11-2.1.0.tgz
  2. We enter the /opt directory, and extract the archive:
    # cd /opt
    # tar -xvf kafka_2.11-2.1.0.tgz
    And create a symlink called /opt/kafka that points to the now created /opt/kafka_2_11-2.1.0 directory to make our lives easier.
    ln -s /opt/kafka_2.11-2.1.0 /opt/kafka
  3. We create a non-privileged user that will run both zookeeper and kafka service.
    # useradd kafka
  4. And set the new user as owner of the whole directory we extracted, recursively:
    # chown -R kafka:kafka /opt/kafka*
  5. We create the unit file /etc/systemd/system/zookeeper.service with the following content:


    [Unit]
    Description=zookeeper
    After=syslog.target network.target
    
    [Service]
    Type=simple
    
    User=kafka
    Group=kafka
    
    ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
    ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
    
    [Install]
    WantedBy=multi-user.target
    Note that we don't need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service, that contains the following lines of configuration:
    [Unit]
    Description=Apache Kafka
    Requires=zookeeper.service
    After=zookeeper.service
    
    [Service]
    Type=simple
    
    User=kafka
    Group=kafka
    
    ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
    ExecStop=/opt/kafka/bin/kafka-server-stop.sh
    
    [Install]
    WantedBy=multi-user.target
  6. We need to reload systemd to get it read the new unit files:


    # systemctl daemon-reload
  7. Now we can start our new services (in this order):
    # systemctl start zookeeper
    # systemctl start kafka
    If all goes well, systemd should report running state on both service's status, similar to the outputs below:
    # systemctl status zookeeper.service
      zookeeper.service - zookeeper
       Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
       Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago
     Main PID: 11628 (java)
        Tasks: 23 (limit: 12544)
       Memory: 57.0M
       CGroup: /system.slice/zookeeper.service
                11628 java -Xmx512M -Xms512M -server [...]
    
    # systemctl status kafka.service
      kafka.service - Apache Kafka
       Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
       Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago
     Main PID: 11949 (java)
        Tasks: 64 (limit: 12544)
       Memory: 322.2M
       CGroup: /system.slice/kafka.service
                11949 java -Xmx1G -Xms1G -server [...]
  8. Optionally we can enable automatic start on boot for both services:
    # systemctl enable zookeeper.service
    # systemctl enable kafka.service
  9. To test functionality, we'll connect to Kafka with one producer and one consumer client. The messages provided by the producer should appear on the console of the consumer. But before this we need a medium these two exchange messages on. We create a new channel of data called topic in Kafka's terms, where the provider will publish, and where the consumer will subscribe to. We'll call the topic FirstKafkaTopic. We'll use the kafka user to create the topic:
    $ /opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic FirstKafkaTopic


  10. We start a consumer client from the command line that will subscribe to the (at this point empty) topic created in the previous step:
    $ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning
    We leave the console and the client running in it open. This console is where we will receive the message we publish with the producer client.
  11. On another terminal, we start a producer client, and publish some messages to the topic we created. We can query Kafka for available topics:
    $ /opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181
    FirstKafkaTopic
    And connect to the one the consumer is subscribed, then send a message:
    $ /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic FirstKafkaTopic
    > new message published by producer from console #2
    At the consumer terminal, the message should appear shortly:
    $ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning
     new message published by producer from console #2
    If the message appears, our test is successful, and our Kafka installation is working as intended. Many clients could provide and consume one or more topic records the same way, even with a single node setup we created in this tutorial.
ARE YOU LOOKING FOR A LINUX JOB?
Submit your RESUME, create a JOB ALERT or subscribe to RSS feed on LinuxCareers.com.
DO YOU NEED ADDITIONAL HELP?
Get extra help by visiting our LINUX FORUM or simply use comments below.

You may also be interested in:



Comments and Discussions