Home / My Disclaimer / Who am I? / Search... / Sign in

The Case of the Failed Restore

by Steve Syfuhs / November 13, 2012 03:51 PM

As applications get more and more complex the backup and restore processes also tend to become more complex.

A lot of times backup can be broken down into simple processes:

  1. Get data from various sources
    • Database
    • Web.config
    • DPAPI
    • Certificate Stores
    • File system
    • etc
  2. Persist data to disk in specific format
  3. Validate data in specific format isn’t corrupt

A lot of times this can be a manual process, but in best case scenarios its all automated by some tool. In my particular case there was a tool that did all of this for me. Woohoo! Of course, there was a catch. The format was custom, so a backup of the database didn’t just call SQL backup; it essentially did a SELECT * FROM {all tables} and serialized that data to disk.

The process wasn’t particularly fancy, but it was designed so that the tool had full control over the data before it was ever restored. There’s nothing particularly wrong with such a design as it solved various problems that creep in when doing restores. The biggest problem it solved was the ability to handle breaking changes to the application’s schema during an upgrade as upgrades consisted of

  1. Backup
  2. Uninstall old version
  3. Install new version
  4. Restore backed up data

Since the restore tool knew about the breaking changes to the schema it was able to do something about it before the data ever went into the database. Better to mangle the data in C# than mangle the data in SQL. My inner DBA twitches a little whenever I say that.

Restoring data is conceptually a simple process:

  1. Deserialize data from specific format on disk
  2. Mangle as necessary to fit new schema
  3. Foreach (record in data) INSERT record

In theory the goal of the restore tool should be to make the application be in the exact same state as it was when it was originally backed up. In most cases this means having the database be exactly the same row for row, column for column. SQL Restore does a wonderful job of this. It doesn’t really do much processing of the backup data -- it simply overwrites the database file. You can’t get any more exact than that.

But alas, this tool didn’t use SQL Backup or SQL Restore and there was a problem – the tool was failing on restoring the database.

Putting on my debugger hat I stared at the various moving parts to see what could have caused it to fail.

The file wasn’t corrupt and the data was well formed. Hmm.

Log files! Aha, lets check the log files. There was an error! ‘There was a violation of primary key constraint (column)…’ Hmm.

Glancing over the Disk Usage by Top Tables report in SQL Management Studio suggested that all or most of the data was getting inserted into the database based on what I new of the data before it was backed up. Hmm.

The error was pretty straightforward – a record was trying to be inserted into a table that had a primary key value that already existed in that table. Checking the backup file showed that no primary keys were actually duplicated. Hmm.

Thinking back to how the tool actually did a restore I went through the basic steps in my head of where a duplicate primary key could be created. Serialization succeeded as it was able to make it to the data mangling bit. The log files showed that the mangling succeeded because it dumped all the values and there were no duplicates. Inserting the data mostly succeeded, but the transaction failed. Hmm.

How did the insert process work? First it truncated all data in all tables, since it was going to replace all the data. Then it disabled all key constraints so it could do a bulk insert table by table. Then it enabled identity insert so the identity values were exactly the same as before the backup. It then looped through all the data and inserted all the records. It then disabled identity insert and enabled the key constraints. Finally it committed the transaction.

It failed before it could enable the constraints so it failed on the actual insert. Actually, we already knew this because of the log file, but its always helpful to see the full picture. Except things weren’t making any sense. The data being inserted was valid. Or was it? The table that had the primary key violation was the audit table. The last record was two minutes ago, but the one before it was from three months ago. The last record ID was 12345, and the one before it was 12344. Checking the data in the backup file showed that there were at least twice as many records so it failed halfway though the restore of that table.

The last audit record was: User [MyTestAccount] successfully logged in.

Ah dammit. That particular account was used by an application on my phone, and it checks in every few minutes.

While the restore was happening the application in question was still accessible, so the application on my phone did exactly what it was supposed to do.

Moral of the story: when doing a restore, make sure nobody can inadvertently modify data before you finish said restore.

SQL Server does this by making it impossible to write to the database while its being restored. If you don’t have the luxury of using SQL Restore be sure to make it impossible to write to the database by either making the application inaccessible or code it into your application to be able to run in a read only fashion.

Comments are closed

// About

Steve is a renaissance kid when it comes to technology. He spends his time in the security stack.