Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
All of these kinds of services are used in some form or another by distributed applications.
In this article, we have explained the necessary steps to install and configure 3 Node Zookeeper Cluster with a definite quorum on Ubuntu 18.04.
In this tutorial you will learn:
- What is Zookeeper and its Overview.
- What is the Architecture of Zookeeper.
- How to Configure the Zookeeper hosts and Add Zookeeper User.
- How to Install and Configure Oracle JDK.
- How to Configure and Setup the Zookeeper.
- How to Configure Worker Nodes to join the Swarm Cluster.
- How to Verify the Zookeeper Cluster and Ensemble.
Software Requirements and Conventions Used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Ubuntu 18.04 |
Software | zookeeper-3.4.12, Oracle JDK 1.8.0_192 |
Other | Privileged access to your Linux system as root or via the sudo command. |
Conventions |
# – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
Zookeeper Overview
Zookeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system. Unlike normal file systems Zookeeper provides its clients with high throughput, low latency, highly available, strictly ordered access to the znodes.
The performance aspects of Zookeeper allow it to be used in large distributed systems. The reliability aspects prevent it from becoming the single point of failure in big systems. Its strict ordering allows sophisticated synchronization primitives to be implemented at the client.
The name space provided by Zookeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (“/”). Every znode in Zookeeper’s name space is identified by a path. And every znode has a parent whose path is a prefix of the znode with one less element; the exception to this rule is root (“/”) which has no parent. Also, exactly like standard file systems, a znode cannot be deleted if it has any children.
Zookeeper was designed to store coordination data: status information, configuration, location information, etc.
Architecture of Zookeeper
For reliable Zookeeper service, you should deploy Zookeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines Zookeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines Zookeeper can handle the failure of two machines.
Each one of the components that is a part of the Zookeeper architecture has been explained below.
- Client – Clients, one of the nodes in our distributed application cluster, access information from the server. For a particular time interval, every client sends a message to the server to let the sever know that the client is alive.Similarly, the server sends an acknowledgment when a client connects. If there is no response from the connected server, the client automatically redirects the message to another server.
- Server – Server, one of the nodes in our Zookeeper ensemble, provides all the services to clients. Gives acknowledgment to client to inform that the server is alive.
- Leader – Server node which performs automatic recovery if any of the connected node failed. Leaders are elected on service startup.
- Follower – Server node which follows leader instruction.
Configure the Zookeeper hosts and Add Zookeeper User
Before installing the necessary Zookeeper packages for configuration, we will configure the hosts file on all the Ubuntu nodes. After that we will create zookeeper user across all three nodes as zookeeper daemon need to be run as zookeeper
user itself.
Here we have used 3 Ubuntu 18.04 machines.
Zookeeper Node1 – 192.168.1.102 (hostname - node1) Zookeeper Node2 – 192.168.1.103 (hostname – node2) Zookeeper Node3 – 192.168.1.105 (hostname - node3)
Edit the /etc/hosts
file across all three nodes via gedit
or vim
and do the following changes:
192.168.1.102 node1
192.168.1.103 node2
192.168.1.105 node3
After modifying with the above details in the hosts file, check the connectivity with ping between all the nodes.
Now, create the new zookeeper
user and group using the command:
# adduser zookeeper
Install and Configure Oracle JDK
Download and extract the Java archive under the /opt
directory. For more information head over to how to install java on Ubuntu 18.04.
To set the JDK 1.8 Update 192 as the default JVM we will use the following commands :
# update-alternatives --install /usr/bin/java java /opt/jdk1.8.0_192/bin/java 100 # update-alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_192/bin/javac 100
After installation to verify the java has been successfully configured, run the following commands :
# update-alternatives --display java # update-alternatives --display javac
To check the Java Version run the following commands:
# java -version
Upon successful installation you will get the below information:
java version "1.8.0_192" Java(TM) SE Runtime Environment (build 1.8.0_192-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
Configure and Setup the Zookeeper
Download and unzip the Zookeeper package from Official Apache archive in all three Ubuntu machines as shown below:
$ wget https://www-us.apache.org/dist/zookeeper/stable/zookeeper-3.4.12.tar.gz
$ tar -xzvf zookeeper-3.4.12.tar.gz
Edit the bashrc
for the zookeeper user via setting up the following Zookeeper environment variables.
export ZOO_LOG_DIR=/var/log/zookeeper
Source the .bashrc in current login session:
$ source ~/.bashrc
Now, Create the directory zookeeper
under /var/lib
folder which will serve as Zookeeper data directory and create another zookeeper
directory under /var/log
where all the Zookeeper logs will be captured. Both of the directory ownership need to be changed as zookeeper.
$ sudo mkdir /var/lib/zookeeper ; cd /var/lib ; sudo chown zookeeper:zookeeper zookeeper/ $ sudo mkdir /var/log/zookeeper ; cd /var/log ; sudo chown zookeeper:zookeeper zookeeper/
Create the server id for the ensemble. Each zookeeper server should have a unique number in the myid
file within the ensemble and should have a value between 1 and 255.
In Node1
$ sudo sh -c "echo '1' > /var/lib/zookeeper/myid"
In Node2
$ sudo sh -c "echo '2' > /var/lib/zookeeper/myid"
In Node3
$ sudo sh -c "echo '3' > /var/lib/zookeeper/myid"
Now, go to the conf folder under the Zookeeper home directory (location of the Zookeeper directory after Archive has been unzipped/extracted).
$ cd /home/zookeeper/zookeeper-3.4.13/conf/
zookeeper@node1:~/zookeeper-3.4.13/conf$ ls -lrth
total 16K
-rw-r--r-- 1 zookeeper zookeeper 922 Jun 29 21:04 zoo_sample.cfg
-rw-r--r-- 1 zookeeper zookeeper 535 Jun 29 21:04 configuration.xsl
-rw-r--r-- 1 zookeeper zookeeper 999 Nov 24 18:29 zoo.cfg
-rw-r--r-- 1 zookeeper zookeeper 2.2K Nov 24 19:07 log4j.properties
By default, a sample conf file with name zoo_sample.cfg
will be present in conf
directory. You need to make a copy of it with name zoo.cfg
as shown below, and edit new zoo.cfg
as described across all three Ubuntu machines.
$ cp zoo_sample.cfg zoo.cfg
$ ls -lrth /home/zookeeper/zookeeper-3.4.13/conf
total 16K
-rw-r--r-- 1 zookeeper zookeeper 922 Jun 29 21:04 zoo_sample.cfg
-rw-r--r-- 1 zookeeper zookeeper 535 Jun 29 21:04 configuration.xsl
-rw-r--r-- 1 zookeeper zookeeper 999 Nov 24 18:29 zoo.cfg
-rw-r--r-- 1 zookeeper zookeeper 2.2K Nov 24 19:07 log4j.properties
$ vim /home/zookeeper/zookeeper-3.4.13/conf/zoo.cfg
dataDir=/var/lib/zookeeper
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
Now, do the below changes in log4.properties
file as follows.
$ vim /home/zookeeper/zookeeper-3.4.13/conf/log4j.properties
zookeeper.log.dir=/var/log/zookeeper
zookeeper.tracelog.dir=/var/log/zookeeper
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE
After the configuration has been done in zoo.cfg
file in all three nodes, start zookeeper in all three nodes one by one, using following command:
$ /home/zookeeper/zookeeper-3.4.13/bin/zkServer.sh start
The log file will be created in /var/log/zookeeper
of zookeeper named zookeeper.log
, tail the file to see logs for any errors.
$ tail -f /var/log/zookeeper/zookeeper.log
Verify the Zookeeper Cluster and Ensemble
In Zookeeper ensemble out of three servers, one will be in leader mode and other two will be in follower mode. You can check the status by running the following commands.
$ /home/zookeeper/zookeeper-3.4.13/bin/zkServer.sh status
$ echo stat | nc node1 2181
$ echo mntr | nc node1 2181
$ echo srvr | nc localhost 2181
If you need to check and see the znode, you can connect by using the below command on any of the zookeeper node:
$ /home/zookeeper/zookeeper-3.4.13/bin/zkCli.sh -server `hostname -f`:2181
Conclusion
It has become one of most preferred choice for creating highly available distributed systems at scale. Zookeeper project is one of the most successful projects from the Apache foundation, it has gained wide adoption by top companies, delivering numerous benefits related to big data.
Providing a solid base to implement different big data tools, Apache Zookeeper has allowed the companies to function smoothly in the big data world. Its ability to provide multiple benefits at once has made it one of most preferred applications to be implemented at a large scale.