Configuration of High-Availability Storage Server Using GlusterFS

March 12, 2013
by Lubos Rendek

Introduction

Whether you are administrating a small home network or an enterprise network for a large company the data storage is always a concern. It can be in terms of lack of disk space or inefficient backup solution. In both cases GlusterFS can be the right tool to fix your problem as it allows you to scale your resources horizontally as well as vertically. In this guide we will configure the distributed and replicated/mirror data storage. As the name suggests a GlusterFS’s distributed storage mode will allow you to evenly redistribute your data across multiple network nodes, while a replicated mode will make sure that all your data are mirrored across all network nodes.

What is GlusterFS

After reading the introduction you should have already a fair idea what GlusterFS is. You can think of it as an aggregation service for all your empty disk space across your whole network. It connects all nodes with GlusterFS installation over TCP or RDMA creating a single storage resource combining all available disk space into a single storage volume ( distributed mode ) or uses the maximum of available disk space on all notes to mirror your data ( replicated mode ). Therefore, each volume consist of multiple nodes, which in GlusterFS terminology are called bricks.

Preliminary Assumptions

Although GlusterFS can by installed and used on any Linux distribution, this article will primarily use Ubuntu Linux. However, you should be able to use this guide on any Linux Distribution like RedHat, Fedora, SuSe, etc. The only part which will be different will be the GlusterFS installation process.

Furthermore, this guide will use 3 example hostnames:

  • storage.server1 – GlusterFS storage server
  • storage.server2 – GlusterFS storage server
  • storage.client – GlusterFS storage client

Use DNS server or /etc/hosts file to define your hostnames and adjust your scenario to this guide.

GlusterFS Installation

GlusterFS server needs to be installed on all hosts you wish to add to your final storage volume. In our case it will be storage.server1 and storage.server2. You can use GlusterFS as a single server and a client connection to act as an NFS server. However, the true value of GlusterFS is when using multiple server hosts to act as one. Use the following linux command on both servers to install the GlusterFS server:

storage.server1 $ sudo apt-get install glusterfs-server

and

storage.server2 $ sudo apt-get install glusterfs-server

The above commands will install and start glusterfs-server on both systems. Confirm that both servers are running with:

$ sudo service glusterfs-server status

Distributed storage configuration

First we will create a GlusterFS distributed volume. In the distributed mode, GlusterFS will distribute evenly any data across all connected bricks. For example, if clients write files file1, file2, file3 and file4 to a GlusterFS mounted directory, then server.storage1 will contain file1 and file2 and server.storage2 will get file3 and file4. This scenario is illustrated using the diagram below.

GlusterFS distributed storage configuration

Peer Probe

First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.

storage.server1 $ sudo gluster peer probe storage.server2
Probe successful

The above command will add storage.server2 to a trusted server pool. This settings are replicated across any connected servers so you do not have to run the above command on other serves. By now both servers will have the peer config file available similar to the one below:

$ cat /etc/glusterd/peers/951b8732-42f0-42e1-a32f-0e1c4baec4f1 
uuid=951b8732-42f0-42e1-a32f-0e1c4baec4f1
state=3
hostname1=storage.server2

Create Storage Volume

Next, we can use both servers to define a new storage volume consisting of two bricks, one for each server.

storage.server1 $ sudo gluster volume create dist-vol storage.server1:/dist-data \
storage.server2:/dist-data
Creation of volume dist-vol has been successful. Please start the volume to access data.

The above command created a new volume called dist-vol consisting of two bricks. If directory /dist-data does not exist it will be also created on both servers by the above command. As it was already mentioned before, you can add only one brick to the volume and thus making the ClusterFS server act as an NFS server. You can check whether your new volume was created by:

$ sudo gluster volume info dist-vol

Volume Name: dist-vol
Type: Distribute
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/dist-data
Brick2: storage.server2:/dist-data

Start storage volume

Now, we are ready to start your new volume:

storage.server1 $ sudo gluster volume start dist-vol
Starting volume dist-vol has been successful
storage.server1 $ sudo gluster volume info dist-vol

Volume Name: dist-vol
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/dist-data
Brick2: storage.server2:/dist-data

This concludes a configuration of the GlusterFS data server in the distributed mode. The end result should be a new distributed volume called dist-vol consisting of two bricks.

Setting up Client

Now that we have created a new GlusterFS volume, we can use the GlusterFS client to mount this volume to any hosts. Login to the client host and install the GlusteFS client:

storage.client $ sudo apt-get install glusterfs-client

Next, create a mount point to which you will mount your new dist-vol GlusterFS volume, for example export-dist:

storage.client $ sudo mkdir /export-dist

Now, we can mount the dist-vol GlusterFS volume with the mount command:

storage.client $ sudo mount -t glusterfs storage.server1:dist-vol /export-dist

All shout be ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly:

$ mount | grep glusterf

Testing GlusterFS distributed configuration

Everything is ready so we can start some tests. On the client side crate 4 files in the GlusterFS mounted directory:

storage.client $ touch /export-dist/file1 file2 file3 file4

The GlusterFS will now take all files and redistribute them evenly among all bricks in the dist-vol volume. Therefore, storage.server1 will contain:

storage.server1 $ ls /dist-data/
file3 file4

and storage.server2 will contain:

storage.server2 $ ls /dist-data
file1 file2

Of course your results may be different.

Replicated storage configuration

GlusterFS relicated storage configuration

The procedure of creating a replicated GlusterFS volume is similar to the distributed volume explained earlier. In fact, the only difference is the way how the ClusterFS volume is created. But let’s go again from the start:

Peer Probe

First, we need to make both GlusterFS servers to talk to each other, which means that we are effectively creating a pool of trusted servers.

storage.server1 $ sudo gluster peer probe storage.server2
Probe successful

If this is already done you can skip this step.

Create Storage Volume

In this step we need to create a replica volume.

$ sudo gluster volume create repl-vol replica 2 \ 
storage.server1:/repl-data storage.server2:/repl-data
Creation of volume repl-vol has been successful. Please start the volume to access data.

Basic translation of the above command could be that we have created a replicated volume ( replica ) called repl-vol . The number 2 in the command indicates the stripe count, which means that when expanding this volume we always need to add the number of bricks equal to the multiple of volume stripe count ( 2, 4, 8 16 etc.).

Start storage volume

It is time to start our new replicated volume:

$ sudo gluster volume start repl-vol
Starting volume repl-vol has been successful

Check the status:

storage.server1 $ sudo gluster volume info repl-vol

Volume Name: repl-vol
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: storage.server1:/repl-data
Brick2: storage.server2:/repl-data

Setting up client

The client configuration is the same as when setting up the client for the distributed volume mount.

Install client:

storage.client $ sudo apt-get install glusterfs-client

Create a mount point:

storage.client $ sudo mkdir /export-repl

Mount the repl-vol GlusterFS volume with the mount command:

storage.client $ sudo mount -t glusterfs storage.server1:repl-vol /export-repl

All shout be now ready. Use the mount command to see whether you have mounted the GlusterFS volume correctly:

$ mount | grep glusterf

Testing GlusterFS replicated configuration

The point of the replicated GlusterFS volume is that data will be seamlessly mirrored across all nodes. Thus when creating files in /export-repl/

$ touch /export-repl/file1 file2 file3 file4

all files will be available on both servers:

storage.server1 $ ls /repl-data/
file1 file2 file3 file4

and

storage.server2 $ ls /repl-data/
file1 file2 file3 file4

Expanding GlusterFS volumes

In the case that you need to scale up your data storage to include additional bricks, the process is simple:

$ sudo gluster volume add-brick rep-vol storage.server3:/repl-vol storage.server4:repl-vol /export-repl

This will add another two bricks of storage to your repl-vol. Once you add new bricks you may need to re-balance the entire volume with:

$ sudo gluster volume rebalance repl-vol  fix-layout start

and sync / migrate all data with:

$ sudo gluster volume rebalance repl-vol  migrate-data start

Furthermore, you can check the re-balance progress with

$ sudo gluster volume rebalance vol0 status

Security Settings

In addition to the above configuration you can make the entire volume more secure by allowing only certain hosts to join the pool of trust. For example, if we want only the host with 10.1.1.10 to be allowed into participating in the volume repl-vol we use the following linux command:

$ sudo gluster volume set repl-vol auth.allow 10.1.1.10

In the case that we need the entire subnet simply use asterisk:

$ sudo gluster volume set repl-vol auth.allow 10.1.1.*

Conclusion

GlusterFS is a powerful GPL3 licensed software. One can also use it as a quick software RAID 1 by defining two separate physical device bricks on the single host into the replicated GlusterFS volume. Of course it would be better to use the software raid for that job, but still the possibility is there. I found GlusterFS easy to use and configure.

Appendix

Here I will just list few errors and answers I encountered while playing with GlusterFS:

Incorrect number of bricks

Incorrect number of bricks supplied 1 for type REPLICATE with count 2

If you have created a volume with stripe count 2 you need to add at least 2 additional bricks at that time.

Host storage.server1 not a friend

Host storage.server1 not a friend

First add the GlusterFS server to the pool of trust before you attempt to include it into the volume.



Comments and Discussions
Linux Forum