Tag Archives: Deduplication

The Search for the Ideal Backup Tool Part 2 of 2

In this installment, we publish our results of comparing ZBackup and Attic backup tools.

We put both ZBackup and Attic to two main tests: Backup and restore.

The input file generally was QEMU’s IMG or QCOW2 format containing CentOS or empty data. The hard disk was all SSD RAID1+0. The CPU was 2xHaswell Xeon 2.3 GHz with 6 cores each.

Backup Test

Attic

Backup Number input Size (GB) Num Files Time (hh:mm:ss) Size of folder (GB) Effective Compression Ratio Notes
1 50 3 00:09:54 2.1 23.81
2 50 3 00:00:18 2.1 23.81 No new files. No updates
3 50 3 00:01:15 2.1 23.81 No new files. But minor update to one of the larger files
4 470 5 00:50:16 2.16 217.59 2 new files
5 470 5 00:41:31 2.16 217.59 No new files. But minor update to one of the larger files
Total data processed = 1,090 GB.
Total time for data  = 6,194 seconds

Attic takes 5.68 seconds per GB for data that is generally duplicate like IMG/QCOW2 files containing CentOS install.

 

ZBackup

Backup Number input Size (GB) Num Files Time (hh:mm:ss) Size of folder (GB) Effective Compression Ratio Notes
1 50 3 00:45:43 1.6 31.25
2 50 3 00:08:17 1.6 31.25 No new files. No updates
3 50 3 00:08:22 1.6 31.25 No new files. But minor update to one of the larger files
4 470 5 04:10:13 1.6 293.75 2 new files
5 470 5 04:08:00 1.6 293.75 No new files. But minor update to one of the larger files
Total data processed = 1,090 GB.
Total time for data  = 33,635 seconds

ZBackup takes 30.86 seconds per GB for data that is generally duplicate like IMG/QCOW2 files containing CentOS install.

Restore Test

For restore, all the restored file must match the SHA1 fingerprint as the original file exactly. Both ZBackup and Attic passed this test.

Attic

Restore Number Restore Size (GB) Num files Time (hh:mm:ss)
1 350 1 00:39:11
2 25 1 00:00:20
3 48 2 00:05:18
Total data processed = 423 GB.
Total time for data  = 2,689 seconds

 

Attic takes 6.35 seconds per GB to restore data.

ZBackup

Restore Number Restore Size (GB) Num files Time (hh:mm:ss)
1 350 1 00:24:29 (2 GB cache)
2 350 1 00:26:40 (40 MB cache)
3 25 1 00:01:19
4 48 2 00:06:02
Total data processed = 773 GB.
Total time for data  = 3,510 seconds

 

ZBackup takes 4.54 seconds per GB to restore data.

Comparison

Attic Zbackup Attic vs Zbackup
Backup -seconds/GB 5.68 30.86 -443.31%
Backup Compression 217 293 35.02%
Restore-seconds/GB 6.35 4.54 -28.50%

 

Final selection depends on which factor has more weight. For instance, if you have a cheaper cost to store a GB but need  fast backup time, Attic seems best. If you care about size, Zbackup seems best at the expense of time. I believe, ZBackup has selectable compression algorithms so it might even be faster if you choose a faster LZO compressor, however the author mentions LZO is a caveat. Our quick tests show LZO is definitely faster but compression ratio is lower than attic.

Do let us know you thoughts in the comments

Post Script – The Test script Files
Attic Create Backup Script
run=$1
if [ "$run" == "" ]; then
 echo "Error run number is required."
 exit
fi
 
attic create --stats /vm_backup/atticrepo.attic::$run /virtual_machines/images/file1.img /virtual_machines/images/file2.img . . .
du -h -d 1 /vm_backup/atticrepo.attic
echo "Done"
ZBackup CREATE BACKUP SCRIPT
. . . Preamble Same as attic . . .
zbackup backup --non-encrypted --threads 8 --cache-size 1024mb
/vm_backup/zbak/backups/file1.img.$run < /virtual_machines/images/file1.img

. . . other files . . .

sha1sum was used to calculate SHA1 on restored files.

The Search for the Ideal Backup Tool Part 1 of 2

In the context of virtualization, backing up VM images to storage nodes involves moving very large files. Many VM images are just copies of the OS and data on top. So data deduplication and compression must offer great savings. In our search, we found various utilities which we list later down. But we settled into reviewing two popular ones zbackup and attic . Another popular tool bup was considered but few things like unable to prune old versions was major point for us.

The main requirements were data deduplication, compression, easy to script with and encryption all in one tool. In this article, we will give a background on their usage on CentOS 7.1. We don’t plan on extensive evaluation of various other capabilities as we are looking for these basic features to be done well.

ZBackup

ZBackup describes itself as a globally-deduplicating backup tool originating its inspiration from bup and rsync tools. As you add more files to the archives, it will store duplicate regions once. It also supports AES encrypted files.

Installing ZBackup on CentOS 7.1

ZBackup is the easiest to install. Its available in the EPEL repos and you can simply do yum install zbackup.

Usage

The archive is called a repository in ZBackup.  Its nothing but a folder created for the tool to use where it stores its metadata and all the files added into it for backup.

First step is to initialize the folder , say zbak, with metadata folders.

zbackup init --non-encrypted /kvmbackup/zbak

If you need encryption, you can enable it with a key file.

Next is add files into it.

zbackup backup --non-encrypted --threads 8 --cache-size 1024mb /kvmbackup/zbak/backups/centos_nfs.img.20150721040100  < /virtual_machines/images/centos_nfs.img

It’ll take time for the first add. Subsequent add of the same source file name but with modified contents is generally faster.

Restore files as follows:

zbackup --non-encrypted restore zbak/backups/centos_nfs.img.20150721040100 > /virtual_machines/images/centos_nfs.img

Attic

Attic describes itself as a deduplicating backup program to provide an efficient and secure way to perform daily backups.

INSTALLING Attic ON CENTOS 7.1

Install Python 3.4 from EPEL

yum install python34
curl https://attic-backup.org/downloads/releases/0.14/Attic-0.14-linux-x86_64.tar.gz | tar -C /usr/local/share/ -zxf -
ln -s /usr/local/share/Attic-0.14-linux-x86_64/attic /usr/local/bin/attic

Usage

The archive is also called a repository in Attic.

First step is to initialize the folder , say atticrepo.attic, with metadata folders. In this case, attic can create the folder if it doesn’t exist.

attic init /kvmbackup/atticrepo.attic

Next is add files into it.

attic create --stats /KVMBACK/atticrepo.attic::20150721040100  /virtual_machines/images/centos_nfs.img <otherfiles if necessary>

Restore files as follows:

attic extract atticrepo.attic::20150721040100 virtual_machines/images/centos_nfs.img

One immediate quirk feature of Attic is destination directory can’t be specified as of version 0.14. It will extract it to the current directory but will maintain the original path.

This makes scripted use of this tool a little inconvenient. This feature seems to be on their todo list. But would hope its available sooner.

Which One to choose?

This is the subject of our next post. In the next part, we will compare the speeds of both these tools on backup and on restore path.

Other backup utilities we considered

  • bup
  • Duplicity
  • rsync
  • rdiff-backup
  • backula
  • ZFS  (filesystem, not tool)

Most were either lacking all features we were looking for or were too complex.  Do let us know your thoughts in the comments.

Also see comparison of backup tools on Ask Ubuntu for a list of desktop/GUI tools.