New thinking on incident response

Airbnb matches thousands of travelers with over 2 million rental property listings on their website daily. Reliability is key to the success of the company valued at over $30 billion. A recent interview with a site reliability engineer at Airbnb covered some key metrics for incident response in the high visibility, high uptime operation. Beyond [...]

Mean Time to Innocence gains trust

The earliest reference to the term "Mean Time to Innocence" in a Google search was in 2007 on the Professor Messer training site. By 2012, there were about 50 search results. While definitely not  a scientific study, over the last two years, over 100 new online mentions of this term appear as it gains popularity with [...]

Competing in managed services with OOB

Managed service providers (MSPs) struggle with the basic tenets of sales competition—price and differentiation—at a more pure level than most other IT enterprises. At the end of the day, they need to be cheaper than their competition or better in some way that keeps their customers happy. There are a number of tools available for MSPs [...]

Calculating downtime cost with lots of zeros

Yesterday two high profile outages made the news. United Airlines experienced some kind of a router problem and halted flights for over an hour, and the New York Stock Exchange had to close trading for over three hours while they recovered from what was termed a software rollout issue. While not enough information has been [...]

Config errors better than hackers?

The Facebook downtime event on January 27th, where an hour-long outage at Facebook cut off access to critical status updates worldwide in addition to Instagram posts and Tinder hook-ups, shows the sensitivity of large organizations to public perception of hacking threats. The Lizard Squad hacking group, which apparently took control of the Malaysia Airlines website [...]