Drive Snapshot allows the creation of differential backups.
Differential backups can only be made only once a full image (which is
identical to old images) is created.
After that, a differential backup contains
only the changes done since the last full backup.
For a discussion of why differential backups were chosen, and some implementation details see below.
There are three different files used
Note: Old Images (pre 1.37) are fully supported, and are equivalent to 'full' images.
C:>snapshot C: X:\C_full.sna (as usual)
a corresponding .HSH Hash-file will be created automatically at the location
of the imagefile.
Additional Options
-O
disable hashfile generation
-ODirname
create Hashfile in a different location
-ODirname\Filename
create Hashfile with a different name
C:>snapshot X:\C_full.sna -HC_checksum.hsh
C:>snapshot C: X:\C_diff.sna -hX:\C_full.hsh
C:>snapshot X:\C_diff.sna
with the same options as with a 'normal' image
C:>snapshot X:\C_diff.sna d:
with the same options as with a 'normal' image
is done in 2 steps: simply restore both images (the full and then the diff) after each other
a:>snapshot restore hd1 primary1 X:\C_full.sna
a:>snapshot restore hd1 primary1 X:\C_diff.sna
Both incremental and differential backups serve the same purpose: to make the amount of data saved every day (actually on every backup) smaller.
Both methods start with a full backup; thereafter incremental changes will only save the data changed since the last backup, while differential saves all data changed since the last full backup.
Size: Obviously incremental backups create smaller daily backup files.
However with differential backups it's possible to 'thin out' older backups like
'delete backups older then 1 month, that were created during the week', keeping
only Fridays backups.
While the daily data will be smaller for incremental backups, the total sum of
backup data might be smaller for differentials.
Reliability: when restoring an incremental backup, each and every
increment must be available and readable.
Even a single increment that is broken/unreadable/lost makes the rest of the chain
useless - and you won't learn about this until you (unsuccessfully) try to restore.
This is different for sector-based backups (like Drive Snapshot and similar
products) than for usual
file-based backups, where you might be lucky and have the same file again in a later
image, but in sector based images, every sector may be important - and isn't
necessarily changed later again.
In contrast, the restore of a differential requires only 2 working readable
images (the full and the differential); even if the differential is broken,
restore of the backup one day before or after will be possible.
Restore speed: Restoring an incremental backup requires restore of all images up to the restore date; restoring a differential backup requires only restore of the full backup + one differential which will be usually (much) smaller (and might avoid a lot of DVD swapping).
Easiness to handle and understand: we think that 2 files are easier to
handle and understand (both by the software and the user) then a potential
unlimited number of increments.
It's also much easier to check if the image is
complete before restore - checking this if the backup is spread over 50 daily
DVD is something the user is unlikely to do.
Of course, at this point it's clear we have implemented differential backup, else our argument would be different ;)
for differential/incremental backups, you have to know what changed since the
last backup.
This can be done by tracking changes, or by comparing the current
data with the data in the image.
While tracking changes is much faster then comparing with old image, and
probably possible (at least one competitor implements incremental this way), we
think that tracking changes across reboots, and possibly across access by a
different operating system, is at least problematic, if reliable at
all.
Therefore Drive Snapshot compares the current data with the full image.
Most
of the time, it should run at about the hardware read speed of the drive being
backed up; expect ~2-3 GB/min backup rate on moderate modern hardware.
On the other hand, we would like to have the option to put the full image in a real safe place, possibly a DVD, or on a different server, possibly behind a (slow) internet connection.
For this reason, Drive Snapshot creates a Checksum File of the full image;
all data are compared not to the full image but instead to its checksum.
Since
the Checksum file is much smaller (around 0.5% of the used data), it can (and
should) be stored locally, while the original full image can be safely
removed/put into a safe place.
While using checksum files seems slightly unsafe, it actually is safe.
As
implemented, the checksums are 128 bit, and are used to compare one block of
4KByte with the original 4KByte at the same place on the disk.
Thus the probability of wrongly identifying a single block as identical is
2^-128; the probability of failure in a complete backup of 4TB is 2^30 *
2^-128 = 2^-98.
Wouldn't it be more reliable to compare data directly instead of only the checksums ?
Using a checksum of 128 Bits will make an error every 2^128 Blocks, 4KB each.
For a complete image of 4TB, using hashing will falsely identify a sector as
'already existing' with probability of 2^-98.
On the other hand, current hard disks have a non-recoverable error rate
of surprisingly low 2^-45 (one non recoverable sector per 10^14 byte = 100TB) at
best (search www.google.com for 'error rate non-recoverable').
And tapes aren't
much better, somewhere between 10^14-10^17.
So it's much more probable (2^50 = 10^15) you can't restore an image due
to a failing hard disk, than to a bad restored image due to failing checksums.
We consider checksums safe.
It tolerates defragmentation.
It just compares the data at sector X
with the old data at this place, finds them unequal, and saves the new data
since they have changed, so the differential might grow significantly larger.
In real life this problem isn't as bad as it sounds, if the full backup is
taken at a time when the machine is mostly set up, and has been defragmented.
After this first defragmentation (which can move a lot of data around), usually
only new data will be moved by a defragmentation run, but new data would have
been saved anyway.
This could be overcome by using the checksums as a database to locate the
data somewhere else on the disk, and store something like 'these data for the
current sector X were in the original image at sector Y'.
Unfortunately, this
would force both the differential image and all parts of the full image to
be online accessible at the same time, which would make restore from CD/DVD's
impossible.