Drive Snapshot - Differential Backup

Differential Backups

Drive Snapshot allows the creation of differential backups.

Differential backups can only be made only once a full image (which is identical to old images) is created.
After that, a differential backup contains only the changes done since the last full backup.

For a discussion of why differential backups were chosen, and some implementation details see below.

Implementation and usage:

There are three different files used

the 'Full Backup'; this can be put far away into a safe place, and is no longer used for backups, only for viewing or restoring.
a checksum ('hash') file with extension .HSH; this is a directory of the full image.
It's only used when creating differential images; you may delete it if you don't plan to use differential images.
By default, it's placed into the same destination as the full image, buit should preferably be located on a local drive (for performance reasons)
differential backup files; (.SNA, .SN1,...)

Note: Old Images (pre 1.37) are fully supported, and are equivalent to 'full' images.

Complete (full) Backup; generating a checksum file at the same time:

C:>snapshot C: X:\C_full.sna (as usual)

a corresponding .HSH Hash-file will be created automatically at the location of the imagefile.
Additional Options

        -O                                 disable hashfile generation
        -ODirname                    create Hashfile in a different location
        -ODirname\Filename     create Hashfile with a different name

(Re-)Creation of the checksum file from an existing, possibly old, image:

C:>snapshot X:\C_full.sna -HC_checksum.hsh

Differential Backup (the checksum file must exist):

C:>snapshot C: X:\C_diff.sna -hX:\C_full.hsh

Map and Explore of a differential Image (all parts must be online accessible):

C:>snapshot X:\C_diff.sna

with the same options as with a 'normal' image

Restoring a differential Images from Windows:

C:>snapshot X:\C_diff.sna d:

with the same options as with a 'normal' image

Restoring a differential image from REAL DOS (the recovery disk)

is done in 2 steps: simply restore both images (the full and then the diff) after each other

a:>snapshot restore hd1 primary1 X:\C_full.sna
a:>snapshot restore hd1 primary1 X:\C_diff.sna

Technical discussion

Comparison differential vs.
incremental backups

Both incremental and differential backups serve the same purpose: to make the amount of data saved every day (actually on every backup) smaller.

Both methods start with a full backup; thereafter incremental changes will only save the data changed since the last backup, while differential saves all data changed since the last full backup.

Size: Obviously incremental backups create smaller daily backup files.
However with differential backups it's possible to 'thin out' older backups like
'delete backups older then 1 month, that were created during the week', keeping only Fridays backups.
While the daily data will be smaller for incremental backups, the total sum of backup data might be smaller for differentials.

Reliability: when restoring an incremental backup, each and every increment must be available and readable.
Even a single increment that is broken/unreadable/lost makes the rest of the chain useless - and you won't learn about this until you (unsuccessfully) try to restore.

This is different for sector-based backups (like Drive Snapshot and similar products) than for usual file-based backups, where you might be lucky and have the same file again in a later image, but in sector based images, every sector may be important - and isn't necessarily changed later again.
In contrast, the restore of a differential requires only 2 working readable images (the full and the differential); even if the differential is broken, restore of the backup one day before or after will be possible.

Restore speed: Restoring an incremental backup requires restore of all images up to the restore date; restoring a differential backup requires only restore of the full backup + one differential which will be usually (much) smaller (and might avoid a lot of DVD swapping).

Easiness to handle and understand: we think that 2 files are easier to handle and understand (both by the software and the user) then a potential unlimited number of increments.
It's also much easier to check if the image is complete before restore - checking this if the backup is spread over 50 daily DVD is something the user is unlikely to do.

Design decision - differential vs.
incremental backups

Of course, at this point it's clear we have implemented differential backup, else our argument would be different ;)

other design decisions: compare vs.
tracking changes

for differential/incremental backups, you have to know what changed since the last backup.
This can be done by tracking changes, or by comparing the current data with the data in the image.
While tracking changes is much faster then comparing with old image, and probably possible (at least one competitor implements incremental this way), we think that tracking changes across reboots, and possibly across access by a different operating system, is at least problematic, if reliable at all.

Therefore Drive Snapshot compares the current data with the full image.
Most of the time, it should run at about the hardware read speed of the drive being backed up; expect ~2-3 GB/min backup rate on moderate modern hardware.

other design decisions: real compare vs.
hash file

On the other hand, we would like to have the option to put the full image in a real safe place, possibly a DVD, or on a different server, possibly behind a (slow) internet connection.

For this reason, Drive Snapshot creates a Checksum File of the full image; all data are compared not to the full image but instead to its checksum.
Since the Checksum file is much smaller (around 0.5% of the used data), it can (and should) be stored locally, while the original full image can be safely removed/put into a safe place.

While using checksum files seems slightly unsafe, it actually is safe.
As implemented, the checksums are 128 bit, and are used to compare one block of 4KByte with the original 4KByte at the same place on the disk.
Thus the probability of wrongly identifying a single block as identical is 2^-128; the probability of failure in a complete backup of 4TB is 2^30 * 2^-128 = 2^-98.

FAQ: is hashing reliable enough

Wouldn't it be more reliable to compare data directly instead of only the checksums ?

Using a checksum of 128 Bits will make an error every 2^128 Blocks, 4KB each.
For a complete image of 4TB, using hashing will falsely identify a sector as 'already existing' with probability of 2^-98.

On the other hand, current hard disks have a non-recoverable error rate of surprisingly low 2^-45 (one non recoverable sector per 10^14 byte = 100TB) at best (search www.google.com for 'error rate non-recoverable').
And tapes aren't much better, somewhere between 10^14-10^17.

So it's much more probable (2^50 = 10^15) you can't restore an image due to a failing hard disk, than to a bad restored image due to failing checksums.
We consider checksums safe.

FAQ: what about disk defragmentation ?

It tolerates defragmentation.
It just compares the data at sector X with the old data at this place, finds them unequal, and saves the new data since they have changed, so the differential might grow significantly larger.

In real life this problem isn't as bad as it sounds, if the full backup is taken at a time when the machine is mostly set up, and has been defragmented.
After this first defragmentation (which can move a lot of data around), usually only new data will be moved by a defragmentation run, but new data would have been saved anyway.

This could be overcome by using the checksums as a database to locate the data somewhere else on the disk, and store something like 'these data for the current sector X were in the original image at sector Y'.
Unfortunately, this would force both the differential image and all parts of the full image to be online accessible at the same time, which would make restore from CD/DVD's impossible.