Battle Plan 2018: Illuminate Blind Spots and Unknown Unknowns
December 27, 2017

Josh Gray
Cedexis

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.

Bonus points if you know who came up with that tongue twister. He was talking about terrorists, but we're here to discuss a different sort of war — the Battle for Bandwidth. These days, application and content delivery requires special tactics, an integrated strategy, and well-sourced intelligence. And the unknown unknowns are the true enemy because they inevitably lead to outages, slowdowns, and mutinous customers.

In early November, a major outage caused by a minor configuration error (a route leak, to be exact) at global backbone provider Level 3 created widespread connection issues on both U.S. coasts. Comcast, Verizon, Cox, and Vonage customers were particularly affected.

One small error can have mighty ripple effects, and the cause isn't always apparent to network admins and enterprise customers. The time it took to return the Down Detector maps from angry red to mellow yellow could have been shortened by looking at Real User Measurements (crowdsourced telemetry), realizing it wasn't a single site or ISP, and following a logic tree to find the culprit.

With Global Server Load Balancing, your delivery network is smart enough to see the barricade around the corner and switch routes on the fly — saving the day (and making the other guys look a bit dazed and confused).

Blind spots can be hiding more than outages. Your crack team of DevOps commandos can't run successful release missions if they can't check what's really going on in the field. You don't want them dashing around in the dark without a robust tactical plan based on all the parameters you can assess — when you turn unknown unknowns into known knowns from your various data streams, you can put them to work.

Continuous deployment isn't for the faint of heart — you better have your Kevlar and your night vision goggles. Companies like Salesforce are releasing updates dozens of times a day; but even a handful a week requires a careful strategy. You can use RUM to test an update by initially limiting roll-out to one data center. Check for 40x/50x errors. If you're seeing problems, you can check both user experience with your app (non-updated versions) in other places, and user experience at the same data center where you are testing the updated version, to deduce the source of trouble.

One of the biggest unknown unknowns in traffic management is what's going on in places you haven't served recently. If a story about Boise causes traffic to spike there, and that's not normally an audience hotspot for your service, chances are you won't have any measurements of your own to go on. Community intelligence turns these dark corners of your empire into known knowns through automated crowdsourcing of quality of experience metrics. When combined with real-time server health checks and third-party data streams, you have a powerful ability to make efficient, economical routing decisions, even for destinations you don't have any history with.

The more insight and intelligence can be used to accelerate the acquisition of known knowns, the better it is for your business and your bottom line. In the New Year, we should be less accepting of blind spots. They're expensive — they cost us time, money, and customers. Nobody has enough human problem solvers around to keep putting out fires and rigging up one-off workarounds. Our best talent should be working on the next release, the next big idea, or the next major dilemma (Net Neutrality game changers, anyone?) — not floundering around trying to guess what's holding up traffic. You can't control what you can't see, and on the hybrid IT battlefield, control keeps you on top of the hill. We're pretty sure Donald Rumsfeld would agree.

Josh Gray is Chief Architect at Cedexis

The Latest

September 20, 2018

The latest Accelerate State of DevOps Report from DORA focuses on the importance of the database and shows that integrating it into DevOps avoids time-consuming, unprofitable delays that can derail the benefits DevOps otherwise brings. It highlights four key practices that are essential to successful database DevOps ...

September 18, 2018

To celebrate IT Professionals Day 2018 (this year on September 18), the SolarWinds IT Pro Day 2018: A World Powered by Tech Pros survey explores a "Tech PROactive" world where technology professionals have the time, resources, and ability to use their technology prowess to do absolutely anything ...

September 17, 2018

The role of DevOps in capitalizing on the benefits of hybrid cloud has become increasingly important, with developers and IT operations now working together closer than ever to continuously plan, develop, deliver, integrate, test, and deploy new applications and services in the hybrid cloud ...

September 13, 2018

"Our research provides compelling evidence that smart investments in technology, process, and culture drive profit, quality, and customer outcomes that are important for organizations to stay competitive and relevant -- both today and as we look to the future," said Dr. Nicole Forsgren, co-founder and CEO of DevOps Research and Assessment (DORA), referring to the organization's latest report Accelerate: State of DevOps 2018: Strategies for a New Economy ...

September 12, 2018

This next blog examines the security component of step four of the Twelve-Factor methodology — backing services. Here follows some actionable advice from the WhiteHat Security Addendum Checklist, which developers and ops engineers can follow during the SaaS build and operations stages ...

September 10, 2018

When thinking about security automation, a common concern from security teams is that they don't have the coding capabilities needed to create, implement, and maintain it. So, what are teams to do when internal resources are tight and there isn't budget to hire an outside consultant or "unicorn?" ...

September 06, 2018

In evaluating 316 million incidents, it is clear that attacks against the application are growing in volume and sophistication, and as such, continue to be a major threat to business, according to Security Report for Web Applications (Q2 2018) from tCell ...

September 04, 2018

There's a welcome insight in the 2018 Accelerate State of DevOps Report from DORA, because for the first time it calls out database development as a key technical practice which can drive high performance in DevOps ...

August 29, 2018

While everyone is convinced about the benefits of containers, to really know if you're making progress, you need to measure container performance using KPIs.These KPIs should shed light on how a DevOps team is faring in terms of important parameters like speed, quality, availability, and efficiency. Let's look at the specific KPIs to track for each of these broad categories ...

August 27, 2018

Protego Labs recently discovered that 98 percent of functions in serverless applications are at risk, with 16 percent considered "serious" ...

Share this