In a recent column, I discussed how the recent AWS outage(s) and Log4j security holes should lead to enterprises drawing some broader lessons about dealing with complexity and dependencies in modern IT environments. Indeed, the Log4j vulnerability is a sad reminder that far-reaching supply-chain security holes have become an unfortunate tradition of the holiday season (have we forgotten the SolarWinds imbroglio already?).
As I detail, the AWS outage Service Event™ demonstrated that even the largest and most sophisticated hyperscale operators aren’t immune from mistakes that cascade into a multitude of other problems. While the AWS outage was spotty, leaving many customers only minorly inconvenienced, the vulnerability in the Log4j software library is broad (affecting millions of software and service users), deep (lodged within countless commercial and open source applications) and likely to be lengthy (with ramifications lasting well into next year).
Together these events illustrate the fragility of modern IT systems and the ‘digital transformation’ strategies utterly dependent upon them. Unfortunately, there aren’t easy answers to these problems since businesses and developers have flocked to cloud services and open source code for good reasons — convenience, cost model, steady flow of features and updates — and have been willing to live with the occasional outage or urgent security patch.
The bigger problem is that enterprises have gradually, often unknowingly, deepened their dependence on such software and services to the point where an unexpected major incident can significantly impair revenue, damage customer relations and increase support costs. However, like the proverbial frog stuck in a pot of water that’s slowly been brought to the boiling point, it’s too late to jump out.
I conclude with some steps organizations can take to proactively monitor and mitigate such issues. In sum, don’t trust, always verify and have multiple contingency plans.