3 DevOps Lessons Learned While Scaling
July 15, 2016

Adam Serediuk

Throughout my 15 years in operations I've noticed the same dilemma pop up within most organizations that practice traditional software development – many companies have gotten into the habit of "triage" development. They react to problems defensively, and because the strategy is reactive, triage development occupies most of the ops team's time.

When the ops team is constantly putting out fires, they have no time, people, or tools left for building the actual product. Make no mistake, the operational component is equally important to the product. Recognizing the cost of constant fire-fighting, many teams take a more proactive approach and try to predict breakdowns.

Failures are inevitable. However by implementing a few best practices, operations professionals can take back control of their output and build more resilient systems. Here are three lessons I've learned that have allowed my teams to spend more time building up, rather than hunkering down:

1. Automate the B.S.

We've already established that spending your time putting out fires is a sure-fire way to guarantee you'll never be productive. So, should you automate your system completely and eliminate the need for human intervention in incidents? Maybe, but that's an awfully lofty goal to start. You may not need total automation.

The key is to find the right areas to automize. This should be determined strategically, but there will also be an element of trial and error. If you put automation in the wrong place or discover a newer, better way of doing things, don't be afraid to throw code out.

Some people think that constant change causes lines and logic to become muddled. I happen to believe the opposite. Constant change can eliminate problems. Ops teams should not be intimidated by change. Embracing change on a regular basis will make it less scary, and you'll see fewer fires as a result.

This must be balanced with automated testing and QA of Ops code and infrastructure. The same software development approach to unit testing and test plans can and should be applied to Ops, to eliminate regression and enable confidence in change.

2. Go Beyond IT Automation

Repeat after me: Deploying an enterprise IT automation platform is not the same as adopting DevOps. Developers, systems administrators and operations professionals use these platforms to manage the continuous integration/delivery pipeline that defines agile software development and manage system environments. While IT automation platforms are important for DevOps practitioners, they are in no way the foundation of the model.

Give equal focus to the process – the build, test, release, deploy and monitoring lifecycle – so you can iterate quickly on changes. I've seen far too many DevOps teams focus only on their automation code without giving adequate attention to the software development process and how this code fits into the larger picture. This means having ongoing conversations with your teams, QA, Development and Ops alike.

By ensuring conversations are ongoing, DevOps teams can deploy IT automation without making the situation more complicated for themselves. The team should be able to understand each deployment framework or tool selected to run automation code and where it fits into the big picture. This may mean team meetings and regular messaging, but, hey, communication is what DevOps is all about.

3. Reset Your Definition of Done

As we've seen, constant change is essential for avoiding problems. That's why startups are passing over traditional development for soft releases and continuous, everyday delivery. It's also more stimulating for the team when every day is different, and releasing small, incremental changes is safer than large monolithic releases. Recognizing the changing tide of software development, the industry has developed deployment tools that have unit and integration testing baked into them.

Thanks to these new tools, IT professionals are able to complete tasks to a fuller extent. Not only do they build, they also test and launch. With this power comes responsibility; there's no excuse for leaving anything short of "done." This is where product owners can enable and support the process, by giving equal importance to uptime, continuous delivery and testing as part of story planning.

One of my past roles was Lead Operations Engineer at a mobile and online gaming company, where my team and I built the software and cloud infrastructure. Recently, I spoke to my former boss and he noted how the code I built performed reliably in the two years since I left. The secret was developing a cohesive system, a complete package through continuous iteration. Investing in this process allowed us to build a product that adhered to the new definition of "done", which made it good enough to last.

Which brings us to what might be the most important lesson of all: software code should not exist separately from infrastructure code. The reason is simple – infrastructure without software is pointless and software can't exist without infrastructure.

Conway's Law states that "organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." In short, if teams don't talk to each other, parts won't talk to each other. At the same time, if a product is built to work as a whole, it will work. So, why not put the pieces together?

Adam Serediuk is Director of Operations at xMatters.

The Latest

July 17, 2018

In my first blog in this series, I highlighted some of the main challenges teams face with trying to scale mainframe DevOps. To get past these hurdles, the key is to develop an incremental approach that enables teams to capture value along each step of the journey ...

July 16, 2018

The key to mainframe DevOps success is in quickly identifying and removing major bottlenecks in the application delivery lifecycle. Major challenges include collaboration between mainframe and distributed teams, lack of visibility into the impact of software changes, and limited resource flexibility with scaling out necessary testing initiatives. Now let's take a closer look at some of these key challenges and how IT departments can address them ...

July 11, 2018

How much are organizations investing in the shift to cloud native, how much is it getting them? ...

July 10, 2018

In the shift to cloud native, many organizations have adopted a configuration-as-code approach. This helps drive up application deployment velocity by letting developers and DevOps teams reconfigure their deployments as their needs arise. Other organizations, particularly the more regulated ones, still have security people owning these tools, but that creates increased pressure on the security organization to keep up. How much are organizations investing in this process, and how much is it getting them? ...

June 28, 2018

More than a third of companies that use serverless functions are not employing any application security best practices and are not using any tools or standard security methodologies to secure them, according to the State of Serverless Security survey, conducted by PureSec ...

June 27, 2018

The popularity of social media platforms and applications is spurring enterprises to adopt "social business" models to better engage with employees and customers and improve collaboration, according to a new study published by ISG ...

June 25, 2018

The previous chapter in this WhiteHat Security series discussed Codebase as the first step of the Twelve-Factor App and defined a security best practice approach for ensuring a secure source control system. Considering the importance of applying security in a modern DevOps world, this next chapter examines the security component of step two of the Twelve-Factor methodology. Here follows some actionable advice from the WhiteHat Security Addendum Checklist, which developers and ops engineers can follow during the SaaS build and operations stages ...

June 21, 2018

DevSecOps is quickly gaining support and traction, within and beyond information security teams. In fact, 70% of respondents believe their culture can embrace the change needed to fuse Security and DevOps, according to a new survey of 80 security professionals by Aqua Security ...

June 20, 2018

The larger the company size, the higher the proportion of low IT performers, according to the State of DevOps: Market Segmentation Report from Puppet, based on the 2017 State of DevOps Survey data ...

June 18, 2018

An overwhelming 83 percent of respondents have concerns about deploying traditional firewalls in the cloud, according to Firewalls and the Cloud, a survey conducted by Barracuda Networks...

Share this