As the saying goes, a gramme of prevention is worth a kilo of cure. Well, a backup or the process of backing up refers to making copies of data so that these additional copies may be used to restore the original after a data loss event. Backups are useful primarily for two purposes. The first is to restore a state following a disaster (called disaster recovery). The second is to restore small numbers of files after they have been accidentally deleted or corrupted. Data loss is also very common. 66% of internet users have suffered from serious data loss. So thats a generic view of the theory, what about the practice? !
Backup and recovery systems have been around since the beginning of the digital revolution. SAN type hard disk based systems have minimised or even removed the impact of tape backup systems, but even these solutions don’t address one, simple little problem. How do you know the backup is working? You need to TEST it.
Although in fact there is no one, or short, answer to this particular problem. The longer answers do tend to fall into two categories: Test everything; and check everything. By a huge margin, testing everything is the most critical thing you can do to ensure your backups are working the way you think they are.
The problem here is that testing is not a simple, “five minutes and you’re done” operation. While you can usually spot-test data integrity by restoring a few files each week in a limited amount of time, that’s not truly testing your solution. Restoration testing from multiple tapes and multiple disk systems to non-production servers is the best way to test file-level backups. While it is impractical to do that weekly, it should be done at least twice per year using different servers/data/tapes each time. This ensures that you will be able to recover from your tape/disk media to your servers properly, and that you’re not suddenly caught without the knowledge or software tools required when an emergency comes.
Full restoration of a server system should also be done at least twice per year if you are using a full-server recovery tool. This is more difficult than just restoring data, as you can’t do a full-server restore to a machine assigned to some other purpose day-to-day – you’d overwrite the required server with your test restore. This means you’ll need to have either spare physical machines or virtual servers in order to create temporary systems for use in the testing process. Many organisations refuse to budget for these types of tests and testing hardware, and find themselves without a valid testing strategy.
If you store your tapes off-site, test your recall method as well, at least once a year. This is usually another item you’ll need to budget for, but it is vital to perform this test. Geographic location and traffic patterns can have a huge impact on how fast your tapes make it back to your location, and having a fire drill once a year or more will give you an idea of what to expect during the emergency.
Secondly, keep in mind that – supplier statements aside – backup systems should never be ‘set it and forget it’ type solutions. While you definitely shouldn’t need to change settings and check in on it every day, you should be checking in on it at least weekly to ensure that everything is running smoothly. Check backup logs or event logs to see if there are any backup-related errors. Also make sure that you’re following the vendor’s recommendations for replacement of tapes and hardware maintenance. These checks only take a few minutes of your time, but can head off massive headaches when the time comes to use the data on the tapes/disks.
Most backup systems allow you to receive emails when everything is going smoothly or when errors or warnings occur. Set these alerts up and do not ignore them. Ten minutes of troubleshooting when you first see the alert can head off ten hours or much more of problem solving later on. This form of checking up on your systems can be done just by reading your email, and so it’s an easy way to keep tabs on things.
So remember, the best backup system in the world is suspect if you don’t test regularly and completely, and don’t keep an eye on it from time to time. As the saying goes, a gramme of prevention is worth a kilo of cure.