How To backup data with rsync command under Linux

As a system administrator or just a backup-conscious home user, sooner or later (usually sooner) you will have to deal with backups. Disasters do happen, ranging from electrical storms to drive failures, and one needs to be prepared. We cannot stress enough the importance of having copies of important data. While the whole concept of backup is too long for this article, we will focus on rsync for what’s called incremental backups.

Incremental backups are based on the idea that, once you have a copy of the data you need to backup, consequent backups of the same data should be incremental, meaning that you only update the backup copy with the differences since the last operation occurred, not create another full copy. We will detail here a setup we have at home for backing up important data, but the examples here can be used at larger facilities. Once you get started, you will know what, where and when you need.

If you have a backup server that’s up 24/7, you can create a cronjob to backup your data periodically. Since our example is home-based, we have a backup server, but since it’s not up all the time, we will show you how to do it manually. rsync needs to be installed on both systems, and that’s about it, no other setup chores must be performed, at least in simple cases. Please remember that you are not by all means tied to Linux or other Unix platform : rsync is available also for Windows. If you are worried about security, rsync is working over SSH and can be regarded as a secure replacement for rcp (remote copy) command, so it’s all good.

Working with rsync

In our scenario, the machine containing files to be backed up is a Debian testing machine; we simply did a

 # aptitude install rsync

to install it and that was all we had to do on that machine. The backup machine is a FreeBSD 8.2-STABLE box, and there we did

 # cd /usr/ports/net/rsync && make install clean

to install rsync. We did no extra configuration on neither of these machines, but again, this is a simple scenario. Read the rsync manual for options you may need for your particular case. Before we get to the real deal, let us explain a bit about usual rsync options and command line options (yes, we know, 90% don’t read the manuals). With rsync you can either pull or push the data from/towards its destination, so generally the syntax will be

[pull] rsync [options] $source $destination
[push] rsync [options] $destination $source

The local path can be any relative or absolute path. The remote path is exactly like the one you use with SSH or some other programs that work over SSH : $user@{$hostname or $ipaddress}:$path . Translating what we said above and using a practical example (the drive for storing backups is mounted under /data1 on the BSD machine and we want to back up all the user’s movies on 10.1.3.98) , we want to push files from the Debian box to the FreeBSD box, so we do

 $ rsync [options] user@10.1.3.98:/home/user/movies/ /data1/ 

from FreeBSD’s terminal. Of course you must substitute ‘user’, locations and IP addresses with whatever is suitable at your site. As we said, you can easily replace the IP address with a hostname, be it in your local network (make sure you edit /etc.hosts) or a remote hostname that’s known by your DNS server. Make sure also that you have the right permissions set for the destination folder and that you are able to read the source data directory also. Using the above command with no options will do nothing, since the source is a directory, not a specific file. You can use shell wildcards with rsync, like

 $ rsync [options] user@10.1.3.98:/home/user/movies/* /data1/ 

Since rsync is a smart piece of software, it’s better if you use

 $ rsync -avr user@10.1.3.98:/home/user/movies/ /data1/ 

which will copy the data in the movies directory in archive mode (-a) verbosely (-v) and recursively (-r). Long story short, if you want to copy a whole directory, don’t forget about -r and if you want the exact opposite, use -d, which will copy only the directory structure, without the files. If bandwidth is a concern add the -z flag, but remember that there is always a tradeoff between bandwith and CPU time : compressed data stresses the machines more, both of them, because one compresses and sends, the other receives and decompresses. In a nutshell, that’s all we really did for our case here. We will use the exact commands plus the –backup flag later when we will want to sync the data from the backup box and, as stated before, only the differences will be synced. Nonetheless, we will present you other useful and widely used options to rsync, since this is one of the many scenarios in which rsync may serve you, especially that it’s small and fast.

Other options to rsync

rsync has lots of other useful options : what we did was only to give you a common and simple example. The -e flag allows you to specify the remote shell to use with rsync, like

 $ rsync -e ssh [arguments] 

If you want not to sync files that are newer on the receiving side, using -u will get you there. –progress will show you a nice detailed live report on the process of remote syncing. –delete will delete the file at the target machine, if it’s already there. If the file does not exist and you don’t want it created, so you need pure updating of the already existing files, use –existing. Wanna see the changes? No problem, use -i.

Now, these are only a small part of the plethora of options rsync offers, we will let you discover the rest. So, our initial command, with all these new options we’ve learned, would look like this :

 $ rsync -e ssh -avriz --progress --delete user@10.1.3.98:/home/user/movies/ /data1/

We hope you will find this piece of software as much we like it, and if you have any questions, yeah, we repeat this again and again : use the manual, Luke. Remeber to use rsync with care though since, as you gathered, some of its’ options can be quite destructive. In the end, to help you with dealing with day-to-day situations, we will present you with a few examples :

More rsync examples

1. Let’s say you want to synchronize just one file. Obviously, you don’t need -r, since that’s directory-specific, so you’ll just do

 $ rsync -v user@host:/etc/adduser.conf /root/

2. Maybe you wanna play with patterns more advanced than your shell can provide, or you just want to simply exclude/include some files/directories. You’ll just use –include and –exclude, like so:

 $ rsync -avz --include 'g*' --exclude '*' user@host:/etc/ /root/config/ 

This command will only copy directories from /etc/ that start with a ‘g’ and exclude everything else.

3. Perhaps you want to limit the the maximum file size rsync transfers from the destination. One can use –max-size=’size’ where ‘size’ can be affixed with K for Kb, M for Mb and G for Gb.

$ rsync -avz --limit-size='2G' /home/user/movies /backupmedia

4. We talked, until now, about how good rsync is for incremental backups. But if you may want to transfer the whole file, all over again, you are free to do so. Just use -W :

 $ rsync -avzW /home/user/movies/hackers2.avi /backupmedia/ 

5. Did you know rsync can execute commands on the remote machine in order to help you have a list of what to copy/sync? It can, and it works as follows :

 $ rsync -avrz user@host:'`find /home/user/development/ -name *.c -print`'\
 /backup/development/ 

6. Should you want to change the default encryption method ssh uses, use –rsh :

 $ rsync -avz --rsh="ssh -c arcfour -l user" /source /destination 

You might wanna use this especially if you are on a very slow machine.

7. This point deals with preserving various attributes of the files being copied : -p preserves permissions, -X preserves xattrs, -A preserves ACLs (your source file system must have the concept of ACLs, of course), -o preserves owner (superuser only), -H preserves hard links and -g preserves group. Do a search after the word “preserve” inside the rsync manual if what you want to preserve isn’t listed here. Remember that -a does most of the preserving part for you, but if you want finer-grained control, you’re free to do so.

 $ rsync -vzpXAoHg /source /destination 

8. rsync is used by mirror owners everywhere to stay current with the project(s) they are mirroring. Here are some examples:

$ rsync -vaz --delete ftp4.de.FreeBSD.org::FreeBSD/ /pub/FreeBSD/
$ rsync -avz --delete --safe-links rsync.apache.org::apache-dist /path/to/mirror
$ rsync -auH rsync://rsync.chiark.greenend.org.uk/ftp/users/sgtatham/putty-website-mirror/ . 

9. We want to issue a final word of warning : the final’/’ in the source address is important. If you do

 $ rsync -avz /source /destination 

you will get a different result than if you’d have done

 $ rsync -avz /source/ /dstination 

We’ll let you discover what the difference is, however, don’t try this discovery on important data!



Comments and Discussions
Linux Forum