Automation isn’t always a good thing. Based on reporting of Facebook’s outage on March 13, it sounds like they might have out-automated themselves doing a little network configuration.
Initial speculations included the usual suspects: hackers. Facebook soon countered that story line saying that it ruled out a DDoS (Distributed Denial of Service) attack. The following day, Facebook said the problem stemmed from a “server configuration change that triggered a cascading series of issues.”
This is the kind of event leads quickly to the topic of redundancy and backups. Doesn’t Facebook have some kind of backup? Probably, but they might also have automated replication of changes across their systems, leading to the potential for multiple systems to be disrupted.
Sandy Bird, chief technical officer of the computer security firm Sonrai Security said, “For a network as large as Facebook, resetting configurations is a delicate operation. Modern computer systems are built with “phenomenal amounts” of redundancy and self-healing embedded in them.”
Even a small change can replicate across the entire network, creating a cascade of errors.
Clearly Facebook is an 800-pound gorilla in the networking world, but this is the kind of network configuration problem that Uplogix can help out with. Uplogix stores multiple configurations locally and backed up in the Uplogix Control Center that are accessible to roll-back changes that might have had unintended consequences. Maybe the config change took down your network? No problem, use the out-of-band connection to access the gear and revert to a previous “golden” configuration.
For more information, check out Pain-free config changes.