Yesterday two high profile outages made the news. United Airlines experienced some kind of a router problem and halted flights for over an hour, and the New York Stock Exchange had to close trading for over three hours while they recovered from what was termed a software rollout issue.
While not enough information has been made public yet about either situation, you have to assume that both were avoidable with enough prior testing or additional measures in place. How much is enough? It’s all a calculation of risk. For the NYSE, there are a few alternatives, and not a lot of competition. They might have some SLAs with organizations that trade on their exchange. Still, with a capitalization of over $13 trillion, downtime is crazy expensive.
For United, the issues are just one more in a string of public struggles as the airline converts from multiple systems as the result of a merger with Continental Airlines.
With so little information available, it’s impossible to say if Uplogix could have helped or prevented United’s issues yesterday, but what-if conversations in the Uplogix office focused on some of our core functionalities made for issues like this:
With a direct connection to devices like routers and switches, Uplogix monitors at a greater frequency than common for SNMP polling, and since it’s not network traffic, device interrogation over the console port doesn’t tax the device nearly as much. Even at the higher frequencies.
Benefit: Know what’s happening faster and be able to triangulate issues much more rapidly.
Run Book Automation
When a problem is detected — maybe it’s immediate, or a trend over a period of time heading in the wrong direction, Uplogix can take steps on your behalf. Maybe it’s to clear an interface or cycle power. Maybe that router went into ROMmon state — Uplogix can take your initial run book steps automatically. And unlike people, we never skip steps.
Benefit: Know that initial steps happen. In many cases, issues can be addressed without human intervention. In tougher problems, you aren’t starting at square one. You can start knowing exactly where the problem is and who owns the problem. We call that the mean-time-to-innocence.
You can bet that when planes are grounded and trading suspended, many people are asking WTF. For Uplogix customers, that means WAN Traffic Failover — our ability to route primary traffic over our out-of-band link during an outage.
Benefit: Keep traffic moving and customers happy while you address the issue. WTF can be an inexpensive failover alternative for a fair amount of traffic (LTE modems have a healthy bandwidth). Still, probably not what you’d want for running the world’s largest exchange or a top 5 airline.
So maybe Uplogix couldn’t have avoided the issues that plagued United and the NYSE this week, it’s hard to know, but the good news is that Uplogix might be able to solve your issues. Maybe we could even keep you from making the news…