Tuesday, May 29, 2007

A Backup Strategy for UNIX

The canonical standard for doing backups on UNIX systems is tar (short for tape archive). tar converts directories of files into a single tar archive file which you would then spit to an attached tape drive. To be up-to-date, you need to do this on a regular basis - perhaps daily. So each day, you'd tar up all your important files and send them off to the tape drive. To recover your files, you'd need to read the whole tar archive file back from the tape and then extract the specific files or directories that you want from the tar archive. This works fine if you're backing up to a tape drive, because you don't care about network bandwidth. If you're backing up over the network to another machine's hard drive, then each daily archive file has to be sent over the network to your backup machine. Even if you don't change any files from one day to the next, the entire set of files gets archived and sent across the network. Bye bye precious bandwidth!



Before going thru the practical part, have a good reference on the linux backup strategy from the following article at linux journal :



http://www.linuxjournal.com/article/1208



The tools of the game

rsync is a remote synchronization tool. The first time, rsync sends all the data over the network to your backup machine. Just like tar. The benefit comes the next time you backup. Instead of sending all the files again, rsync only transfers files that have been changed. If no files were changed, no files get transferred. And when you want to recover data, you transfer just the specific files that you want back to your machine (using rsync or scp or telnet or whatever).

Note that rsync also works better than an incremental backup strategy using tar. You can use tar to do a full backup weekly and incremental backups daily. Incremental backups backup just files which have changed since yesterday's backup. This will improve bandwidth usage, but makes recovery more complex. You have to extract the latest full backup, then extract any more recent incremental backups over your full backup and then extract the specific files that you need. On the other hand, the backup produced with rsync is always up-to-date (as of the last time you ran rsync).

There are lots of backup tools that use rsync as their workhorse and add features on top of it.



Now let me list out some tools that come handy in preparing a full backup strategy for a whole network.Do make a good reference for each tools before using it over the network.The checklist for such a system is as follows:



i)An archiver program like Tar(In manual backups)

ii)Secure shell access(SSH)between machines

iii)rsync server

iv)
cron - a background process scheduler for scheduling the backup processes.



Now i am giving the backup strategy as an example which i prepared for the Nila services in IIITMK.



A backup strategy for IIITMK services

rsync copies only the diffs of files that have actually changed, compressed and through ssh if you want to for security. Only actual changed pieces of files are transferred, rather than the whole file. This makes updates faster, especially over slower links like modems. FTP would transfer the entire file, even if only one byte changed.

We have set up the rsync in each server whose files are to be backed up.It runs as a service in these servers and get invoked through command.Then, it is configured to run on a specific schedule (once every month) and added into the crontab as a script.An example script is shown below :

(Crontab Script for invoking – rsync.sh)

#!/bin/sh

RSYNC=/usr/bin/rsync

SSH=/usr/bin/ssh

KEY=/home/binu/cron/thishost-rsync-key

RUSER=root

RHOST=

RPATH=/backup/BACKUP

LPATH=/BACKUP3

$RSYNC -az -re "$SSH -i $KEY" $RUSER@$RHOST:$RPATH $LPATH

As soon as the cron script is run, rsync is invoked along with a validate script.This script checks the connection attempt and allow only if it is a valid rsync connect.It will not allow any other simultaneous connection between the backed up server and backup machine.The script is outlined as below:

(Script for validating connections – validate.sh)

#!/bin/sh

case "$SSH_ORIGINAL_COMMAND" in *\&*)

echo "Rejected"

;;

*\(*)

echo "Rejected"

;;

*\{*)

echo "Rejected"

;;

*\;*)

echo "Rejected"

;;

*\<*)

echo "Rejected";;

*\`*)

echo "Rejected"

;;

rsync\ --server*)

$SSH_ORIGINAL_COMMAND

;;

*)

echo "Rejected"

;;

esac.

The shell could be connected using rsync to the backup server only if the keys are present thus offering a secure connection.

Backing up to Tapes

After backing up to the Backup server, the backups are transferred to Tape Storage in SUN Solaris server.The primary aim is to rotate the backup periodically and to save diskspace so that we cannot continue dumping to the Solaris disk.For this, the tape library is used(LTO).The tape Library is connected to the Webserver 1. It can host a total of 24 Tape Catridges. The native/compressed capacity of each tape is 100/200GB. Hence it has a total storage capacity of 2.4/4.8 TB. You have load the concerned tape cartridge in the LTO before taking the backup. Make sure the correct Tape cartridge is loaded as it overwrites data. You can use the same cartridge for taking the backup periodically.

We can now go to the process.After the earlier automated backup, the backup is now stored in Backup server with IP : .Now, we have to manually transfer these files to /data directory of Solaris server having IP .This done using SCP.After this, the files are transferred to Tape using the command:



# ufsdump –0uf /dev/rmt0 IP:

On other side,we also manually backup the contents of Solaris servers using ufsdump command.For example, to backup contents of /data2 (containing edugrid and compchem web content), the command is as follows:

# ufsdump –0uf /dev/rmt0 /data2

NOTE :The ufsdump command is issued in the shell of Node-A (Webserver) so as to back it up to the tape.Also, recent backup of all Portals are kept in /dns/BACKUP directory. It can be transferred to a tape cartridge if necessary. Keep the backup in desired directories for a period of time and do rotate the same thus saving disk space

The article on 'Backup strategy' by used for informative purposes.Copyrights lies with the author and Linux Journal Magazine.