How to install and configure Zookeeper in Ubuntu 18.04

Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
All of these kinds of services are used in some form or another by distributed applications.

In this article, we have explained the necessary steps to install and configure 3 Node Zookeeper Cluster with a definite quorum on Ubuntu 18.04.

In this tutorial you will learn:

What is Zookeeper and its Overview.
What is the Architecture of Zookeeper.
How to Configure the Zookeeper hosts and Add Zookeeper User.
How to Install and Configure Oracle JDK.
How to Configure and Setup the Zookeeper.
How to Configure Worker Nodes to join the Swarm Cluster.
How to Verify the Zookeeper Cluster and Ensemble.

Zookeeper Architectural Overview.

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions
Category	Requirements, Conventions or Software Version Used
System	Ubuntu 18.04
Software	zookeeper-3.4.12, Oracle JDK 1.8.0_192
Other	Privileged access to your Linux system as root or via the `sudo` command.
Conventions	# – requires given linux commands to be executed with root privileges either directly as a root user or by use of `sudo` command $ – requires given linux commands to be executed as a regular non-privileged user

Zookeeper Overview

Zookeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system. Unlike normal file systems Zookeeper provides its clients with high throughput, low latency, highly available, strictly ordered access to the znodes.

The performance aspects of Zookeeper allow it to be used in large distributed systems. The reliability aspects prevent it from becoming the single point of failure in big systems. Its strict ordering allows sophisticated synchronization primitives to be implemented at the client.

The name space provided by Zookeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (“/”). Every znode in Zookeeper’s name space is identified by a path. And every znode has a parent whose path is a prefix of the znode with one less element; the exception to this rule is root (“/”) which has no parent. Also, exactly like standard file systems, a znode cannot be deleted if it has any children.

Zookeeper was designed to store coordination data: status information, configuration, location information, etc.

Architecture of Zookeeper

For reliable Zookeeper service, you should deploy Zookeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines Zookeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines Zookeeper can handle the failure of two machines.

Each one of the components that is a part of the Zookeeper architecture has been explained below.

Client – Clients, one of the nodes in our distributed application cluster, access information from the server. For a particular time interval, every client sends a message to the server to let the sever know that the client is alive.Similarly, the server sends an acknowledgment when a client connects. If there is no response from the connected server, the client automatically redirects the message to another server.
Server – Server, one of the nodes in our Zookeeper ensemble, provides all the services to clients. Gives acknowledgment to client to inform that the server is alive.
Leader – Server node which performs automatic recovery if any of the connected node failed. Leaders are elected on service startup.
Follower – Server node which follows leader instruction.

Configure the Zookeeper hosts and Add Zookeeper User

Before installing the necessary Zookeeper packages for configuration, we will configure the hosts file on all the Ubuntu nodes. After that we will create zookeeper user across all three nodes as zookeeper daemon need to be run as zookeeper user itself.

Here we have used 3 Ubuntu 18.04 machines.

Zookeeper Node1 – 192.168.1.102 (hostname - node1)
Zookeeper Node2 – 192.168.1.103 (hostname – node2)
Zookeeper Node3 – 192.168.1.105 (hostname - node3)

Edit the /etc/hosts file across all three nodes via gedit or vim and do the following changes:

192.168.1.102 node1
192.168.1.103 node2
192.168.1.105 node3

After modifying with the above details in the hosts file, check the connectivity with ping between all the nodes.

Now, create the new zookeeper user and group using the command:

# adduser zookeeper

Install and Configure Oracle JDK

Download and extract the Java archive under the /opt directory. For more information head over to how to install java on Ubuntu 18.04.

To set the JDK 1.8 Update 192 as the default JVM we will use the following commands :

# update-alternatives --install /usr/bin/java java /opt/jdk1.8.0_192/bin/java 100
# update-alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_192/bin/javac 100

After installation to verify the java has been successfully configured, run the following commands :

# update-alternatives --display java
# update-alternatives --display javac

To check the Java Version run the following commands:

# java -version

Upon successful installation you will get the below information:

java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)

Configure and Setup the Zookeeper

Download and unzip the Zookeeper package from Official Apache archive in all three Ubuntu machines as shown below:

$ wget https://www-us.apache.org/dist/zookeeper/stable/zookeeper-3.4.12.tar.gz

$ tar -xzvf zookeeper-3.4.12.tar.gz

Edit the bashrc for the zookeeper user via setting up the following Zookeeper environment variables.

export ZOO_LOG_DIR=/var/log/zookeeper

Source the .bashrc in current login session:

$ source ~/.bashrc

Now, Create the directory zookeeper under /var/lib folder which will serve as Zookeeper data directory and create another zookeeper directory under /var/log where all the Zookeeper logs will be captured. Both of the directory ownership need to be changed as zookeeper.

$ sudo mkdir /var/lib/zookeeper ; cd /var/lib ; sudo chown zookeeper:zookeeper zookeeper/
$ sudo mkdir /var/log/zookeeper ; cd /var/log ; sudo chown zookeeper:zookeeper zookeeper/

Create the server id for the ensemble. Each zookeeper server should have a unique number in the myid file within the ensemble and should have a value between 1 and 255.

In Node1

$ sudo sh -c "echo '1' > /var/lib/zookeeper/myid"

In Node2

$ sudo sh -c "echo '2' > /var/lib/zookeeper/myid"

In Node3

$ sudo sh -c "echo '3' > /var/lib/zookeeper/myid"

Now, go to the conf folder under the Zookeeper home directory (location of the Zookeeper directory after Archive has been unzipped/extracted).

$ cd /home/zookeeper/zookeeper-3.4.13/conf/

zookeeper@node1:~/zookeeper-3.4.13/conf$ ls -lrth
total 16K
-rw-r--r-- 1 zookeeper zookeeper  922 Jun 29 21:04 zoo_sample.cfg
-rw-r--r-- 1 zookeeper zookeeper  535 Jun 29 21:04 configuration.xsl
-rw-r--r-- 1 zookeeper zookeeper  999 Nov 24 18:29 zoo.cfg
-rw-r--r-- 1 zookeeper zookeeper 2.2K Nov 24 19:07 log4j.properties

By default, a sample conf file with name zoo_sample.cfg will be present in conf directory. You need to make a copy of it with name zoo.cfg as shown below, and edit new zoo.cfg as described across all three Ubuntu machines.

$ cp zoo_sample.cfg zoo.cfg

$ ls -lrth /home/zookeeper/zookeeper-3.4.13/conf
total 16K
-rw-r--r-- 1 zookeeper zookeeper  922 Jun 29 21:04 zoo_sample.cfg
-rw-r--r-- 1 zookeeper zookeeper  535 Jun 29 21:04 configuration.xsl
-rw-r--r-- 1 zookeeper zookeeper  999 Nov 24 18:29 zoo.cfg
-rw-r--r-- 1 zookeeper zookeeper 2.2K Nov 24 19:07 log4j.properties

$ vim /home/zookeeper/zookeeper-3.4.13/conf/zoo.cfg

dataDir=/var/lib/zookeeper
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888

Zookeeper Configuration Changes.

Now, do the below changes in log4.properties file as follows.

$ vim /home/zookeeper/zookeeper-3.4.13/conf/log4j.properties

zookeeper.log.dir=/var/log/zookeeper
zookeeper.tracelog.dir=/var/log/zookeeper
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE

Zookeeper log4j Configuration Changes.

After the configuration has been done in zoo.cfg file in all three nodes, start zookeeper in all three nodes one by one, using following command:

$ /home/zookeeper/zookeeper-3.4.13/bin/zkServer.sh start

Zookeeper Service Start on all three Nodes.

The log file will be created in /var/log/zookeeper of zookeeper named zookeeper.log, tail the file to see logs for any errors.

$ tail -f /var/log/zookeeper/zookeeper.log

Verify the Zookeeper Cluster and Ensemble

In Zookeeper ensemble out of three servers, one will be in leader mode and other two will be in follower mode. You can check the status by running the following commands.

$ /home/zookeeper/zookeeper-3.4.13/bin/zkServer.sh status

Zookeeper Service Status Check.

$ echo stat | nc node1 2181

Lists brief details for the server and connected clients.

$ echo mntr | nc node1 2181

Zookeeper list of variables for cluster health monitoring.

$ echo srvr | nc localhost 2181

Lists full details for the Zookeeper server.

If you need to check and see the znode, you can connect by using the below command on any of the zookeeper node:

$ /home/zookeeper/zookeeper-3.4.13/bin/zkCli.sh -server `hostname -f`:2181

Connect to Zookeeper data node and lists the contents.

Conclusion

It has become one of most preferred choice for creating highly available distributed systems at scale. Zookeeper project is one of the most successful projects from the Apache foundation, it has gained wide adoption by top companies, delivering numerous benefits related to big data.

Providing a solid base to implement different big data tools, Apache Zookeeper has allowed the companies to function smoothly in the big data world. Its ability to provide multiple benefits at once has made it one of most preferred applications to be implemented at a large scale.