ConfigMgr Disaster Recovery

Some friends of ours recently had some problems with their ConfigMgr infrastructure and ended up with a system which was booting, but the ConfigMgr console showed an unending list of errors… Reporting didn’t function, the distribution manager log was full of checksum errors, lots of WMI and DCOM errors, it wasn’t looking too promising.

After an hour or so investigation it was obvious that fixing it was going to be more difficult that recovering from backup, so a quick server rebuild and we ran through the following process:

Reinstalling SQL

Step one is to re-add SQL. This was the source of the main gotcha, obviously, it’s pretty important to get the collation order right. If you don’t, what you’ll find immediately you’ve reinstalled and recovered ConfigMgr is that the colleval (and other) logs will fill up with collation match errors. This type of error is a source of unending frustration. I nearly always have the same kind of issue with Package Mapping as the Deployment database the MDT creates has a different collation to the ConfigMgr one.

So, to avoid having to detach databases and reinstall SQL, remember that the default SQL install doesn’t have the correct SQL collation, ConfigMgr requires Latin1_General_CI_AS, the default Latin1_General collation would be Latin1_General_CI_AI. In case this is Klingon to you, the CI and AS bits are related to Case and Accent sensitivity. CI=Case Insensitive and CS=Case Sensitive, same for Accent.

To install SQL with Latin1_General_CI_AS you have to tick the Accent Sensitive button and clear the Case Sensitive button in the SQL collation setup routine.

Reinstalling ConfigMgr

This is pretty straightforward, just reinstall it in the same folder as before.

Recovering ConfigMgr

Again, this is pretty straightforward. I have found that the ConfigMgr Site Repair Wizard (from the Configuration Manager Start Menu folder) can be a little unresponsive when you’re launching it, running it as an admin probably makes a difference, but once it’s launched you’re good to go.

All you now have to do is point it at your recovery wizard at your ConfigMgr backup (you do have a backup right?) and it’ll pull the site back together for you.

A Couple of Minor Gotchas

The first problem we had following the restore was that you need to recreate all of the shares you had previously. Obvious, but an easy one to forget, and one which will break your OS builds.

Also, remember to re-distribute your boot media to the PXE service point. Chances are your other package contents is still where you left it, so this will be ok, but you’ve reinstalled WDS, so will need to repopulate it.

Reinstall the MDT. If you’re using MDT, naturally it’ll need to be reinstalled. The console integration will be put back by the site recovery, but the wizards won’t work until you reinstall the app.

Create and delete some dummy collections, advertisements, packages. Any objects you created between back and site loss will be lost now. To avoid any mix-up in the infrastructure it’s a good idea to create a few collections to take the COLLID autonumber beyond anything you might previously created. The same with packages and adverts. This only takes a couple of minutes and can avoid some headscratching later when machines start installing things they weren’t supposed to.

Expect some inventory resyncs. Any machine which submitted it inventory data in the period between site backup and recovery will, in a week’s time (depending on inventory windows of course) send in updated inventory. ConfigMgr will not like this as it will feel like it’s missed out on some inventory so will request a full resync from the client. These will show up as warnings in the Inventory Dataloader. Don’t worry about these, it’s perfectly normal.


No comments yet

Leave a reply