Examples on how to use Rsync for local and remote data backups and synchonizations

Rsync is a very useful tool which allows Linux system administrators synchronize data locally or with a remote filesystem via the ssh protocol or by using the rsync daemon. Using rsync is more convenient than simply copying data, because it is able to spot and synchronize only the differences between a source and a destination. The program has options to preserve standard and extended filesystem permissions, compress the data during transfers and more. We will see the most used ones in this guide.

In this tutorial you will learn:

  • How to use rsync to syncronize data
  • How to use rsync with a remote filesystem via ssh
  • How to use rsync with a remote filesystem via the rsync daemon
  • How to exclude files from the synchronization

Rsync Examples

Software Requirements and Conventions Used

Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution-independent
Software The rsync application and optionally the rsync daemon
Other No special requirements are needed to follow this guide.
Conventions # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ – requires given linux commands to be executed as a regular non-privileged user

Rsync – usage

Let’s start with rsync basic usage. Suppose we have a directory on our local filesystem, and we want to synchronize its content with another directory, perhaps on an external usb device, in order to create a backup of our files. For the sake of our example our source directory will be /mnt/data/source, and our destination will be mounted at /run/media/egdoc/destination. Our destination contains two file: file1.txt and  file2.txt, while the destination is empty. The first time we run rsync all the data is copied:

The destination path is the last thing we provided in the command. If we now list its content, we can see that it now contains the source files:



$ ls /run/media/egdoc/destination/ -l
total 0
-rw-r--r--. 1 egdoc egdoc 0 Oct  6 19:42 file1.txt
-rw-r--r--. 1 egdoc egdoc 0 Oct  6 19:42 file2.txt

The subsequent times we run rsync to synchronize the two directories, only new files and modified files will be copied: this will save a lot of time and resources. Let’s verify it: first we modify the content of the file1.txt inside the source directory:

$ echo linuxconfig > /mnt/data/source/file1.txt

Then, we will run rsync again, watch the output:

$ rsync -av /mnt/data/source/
/run/media/egdoc/destination
sending incremental file list
file1.txt

sent 159 bytes  received 35 bytes  388.00 bytes/sec
total size is 12  speedup is 0.06

The only copied file is the one we modified, file1.txt.

Create a mirror copy of the source to destination

By default rsync just makes sure that all the files inside the source directory (except the one specified as exceptions) are copied to the destination: it does not take care of keeping the two directories identical, and it doesn’t remove files; therefore, if we want to create a mirror copy of the source into destination, we must use the --delete option, which causes the removal of files existing only inside the destination.

Suppose we create a new file called file3.txt in the destination directory:

$ touch /run/media/egdoc/destination/file3.txt


The file doesn’t exist in the source directory, so if we run rsync with the --delete option, it is removed:

$ rsync -av --delete /mnt/data/source/ /run/media/egdoc/destination
sending incremental file list
deleting file3.txt
./

sent 95 bytes  received 28 bytes  246.00 bytes/sec
total size is 0  speedup is 0.00

Since this synchronization is potentially destructive, you may want to first launch rsync with the --dry-run option, in order to make the program display the operations that would be performed, without actually modifying the filesystem.

Synchronizing files remotely

Until now, we saw how to use rsync to synchronize two local filesystems. The program can also be used to synchronize files remotely, using a remote shell like rsh or ssh, or the rsync daemon. We will explore both methods.

Running rsync through ssh

For the sake of our example we will be still using the same source directory we used in the previous examples, but as destination, we will use a directory on a remote machine with IP 192.168.122.32. I previously setup an openssh server with a key-based login on the machine, therefore I won’t need to provide a password to access it.

How we can runrsync via ssh? First of all, for a remote synchronization to work, rsync must be installed both on the source and the remote machine. Rsync tries to contact a remote filesystem using a remote shell program whenever the destination or source path contains a : character. In modern versions of rsync ssh is used by default; to use another remote shell, or to declare the shell explicitly, we can use the -e option and provide it as argument. Supposing our destination directory on the remote machine is /home/egdoc/destination, we can run:

$ rsync -av -e ssh /mnt/data/source/ egdoc@192.168.122.32:/home/egdoc/destination

Notice that we specified the destination in the form <user>@<machine address>:/path/to/directory.

Contacting a remote machine via the rsync daemon

The other method we can use to synchronize files with a remote machine is by using the rsync daemon. This obviously requires the daemon being installed and running on the destination machine. Rsync tries to contact the remote machine talking to the daemon whenever the source or destination path contains a :: (double colon) separator after the host specification, or when an rsync url is specified as rsync://.



Supposing the rsync daemon is listening on port 873 (the default), on the remote machine, we can contact it by running:

$ rsync -av /mnt/data/source/ 192.168.122.32::module/destination

Alternatively we can use an rsync URL:

$ rsync -av /mnt/data/source/ rsync://192.168.122.32/module/destination

In both the examples, module (highlighted in the command), doesn’t represent the name of a directory on the remote machine, but the name of a resource, or module in the rsync terminology, configured by the administrator, and made accessible via the rsync daemon. The module can point to whatever path on the filesystem.

Excluding files from the synchronization

Sometimes we want to exclude some files or directories from the synchronization. There are basically two ways we can accomplish this task: by specifying an exclusion pattern directly with --exclude (multiple patterns can be specified by repeating the option), or by writing all the patterns into a file (one per line). When using the latter method, we must pass the file path as argument to the --exclude-from option.

All the files and directories matching the pattern will be excluded from the synchronization. For example, to exclude all files with the “.txt” extension we would run:

$ rsync -av /mnt/data/source/ /run/media/egdoc/destination --exclude=*.txt


Conclusions

In this article we took a quick look to rsync, a very useful tool we can use to synchronize files and directories both on local and remote filesystems. We saw the program most used options, and what they let us accomplish, how to specify the source and destination directories, and the methods we can use to contact a remote filesystem. Finally we saw how to exclude files from the synchronization, specifying the exclusion patterns directly or inside a file. Rsync has a lot of options, too many to mention here. As always, we can find all the information we need into the program manual!



Comments and Discussions
Linux Forum