Navigating the Complexities of Operating Large-Scale Kubernetes Environments - 2

July 14, 2022

Sayandeb Saha
NetApp

As containers become the default choice for developing and distributing modern applications and Kubernetes (k8s) the de-facto platform for deploying, running, and scaling such applications, enterprises need to scale their Kubernetes environments rapidly to keep up. However, rapidly scaling Kubernetes environments can be challenging and create complexities that may be hard for you to address and difficult to resolve without a clear strategy. Part 2 of this blog specifies a few more common techniques that you can use to navigate the complexities of managing scaled-out Kubernetes environments.

Start with: Navigating the Complexities of Operating Large-Scale Kubernetes Environments - 1

Keeping Up with Kubernetes Updates

Kubernetes is a thriving open-source project delivering rapid innovation with releases three times a year. If using fully managed Kubernetes from public cloud providers, be prepared for Kubernetes service life cycles that are aggressive. Test your applications with newer versions of Kubernetes as they are released to minimize upgrade-related downtime. If possible, avoid in-place upgrades of Kubernetes clusters — create new clusters, clone your applications to the new clusters, divert traffic to the new clusters, and retire the old clusters. Proactively adopt more recent versions of Kubernetes for running your business-critical applications to prevent public cloud providers from upgrading your Kubernetes control plane version after the end of life of a particular version of the Kubernetes control plane.

For self-managed Kubernetes platforms, vendors also release aggressively to keep up with upstream innovation. You will have more control over when to upgrade, but you do not want to fall behind as it becomes difficult to upgrade if you are too far back and vendors discontinue support for the versions you are on.

Most Kubernetes providers document their life cycle. Read, understand, and take the necessary actions to keep up with rapid releases and subsequent end-of-life schedules.

Reduce or Eliminate Application/Cluster Downtime

Like all other applications and environments, Kubernetes applications and clusters can also experience service-impacting disasters or outages, which can be self-inflicted or accidental. To keep up with the rapid upgrades as explained in the previous section and recover from unplanned outages, use commercially licensed or open-source Kubernetes data protection solutions that provide backup, DR, and mobility for Kubernetes applications. While adopting such solutions look for ones can handle scaled out multi-cluster environments providing a single pane of glass for your K8s protection needs.

GitOps for Application Life-Cycle Management

Releasing applications on Kubernetes can be challenging and even more daunting in scaled-out environments. GitOps, which leverages the power of Git, a popular software version control tool, to provide both revision and change control for applications within the Kubernetes platform, is a best practice that you should consider adopting in large Kubernetes environments.

This model stores the system's desired state in a software version control system like Git. Developers make changes to the configuration files representing the desired state instead of using CLI or GUI to directly make changes on the K8s clusters. A delta between the desired state stored in Git and the system's actual state indicates the changeset that needs to be deployed. These changesets can be reviewed and approved (or rejected) through standard Git processes such as pull requests, code reviews, and merges to master. Approved and merged changesets to the main branch are applied to K8s clusters for changing the system's current state to the desired state based on the configuration stored in Git.

You can quickly and easily release applications using this practice and roll back as needed if things don't go according to plan. Using GitOps for change control leverages Kubernetes' core functionality as a reconciliation engine. This process provides an implicit audit trail of actions taken while releasing applications enabling easier troubleshooting and root cause analyses in large K8s environments.

Comprehensive Observability

Rich observability is essential for maintaining large Kubernetes environments so that you can proactively and reactively mitigate issues that can otherwise become a revenue and/or productivity impacting outage. Kubernetes observability is complex as Kubernetes constitutes multiple layers of infrastructure and several distinct, highly distributed services, each producing its own set of monitoring data with no single master source/log.

To maintain large Kubernetes environments, you must implement:

■ Monitoring of K8s infrastructure (cluster, nodes, namespaces, pods, etc.) and application resources (CPU, memory, storage, networking)

■ Log collection and management for all Kubernetes services and infrastructure

■ Alerts and notifications

Monitoring data generated from various sources need to be collected separately, correlated, and sometimes analyzed to provide the full context of each event or change to an admin, who can understand it, and take corrective action(s) as needed to keep your environment humming without disruption.

Summary

If you have started dabbling into Kubernetes or have small/medium K8s environments, it's only a matter of time you will be managing a large K8s environment as developers embrace containers and Kubernetes for new apps and refactor existing apps. Adopting a few strategies outlined here can reduce some of your pains that are associated with large K8s estates. Seek solutions that can help with your data management needs for large scale Kubernetes environments making upgrades easier, recover from disasters faster, and backup your precious application data with support for "Namespace-as-a-Service" operating models commonly used in such environments.

Sayandeb Saha is Sr. Director, Product Management, at NetApp

Industry News

Cosmonic Launches Cosmonic Control

March 24, 2025

Cosmonic announced the launch of Cosmonic Control, a control plane for managing distributed applications across any cloud, any Kubernetes, any edge, or on premise and self-hosted deployment.

Oracle and Microsoft Add New Services to Oracle Database@Azure

March 20, 2025

Oracle announced the general availability of Oracle Exadata Database Service on Exascale Infrastructure on Oracle Database@Azure(link sends e-mail).

Perforce Acquires Snowtrack

March 20, 2025

Perforce Software announced its acquisition of Snowtrack.

Mirantis and Gcore Partner on AI Infrastructure

March 19, 2025

Mirantis and Gcore announced an agreement to facilitate the deployment of artificial intelligence (AI) workloads.

Amplitude Announces Session Replay Everywhere

March 19, 2025

Amplitude announced the rollout of Session Replay Everywhere.

Oracle Releases Java 24

March 18, 2025

Oracle announced the availability of Java 24, the latest version of the programming language and development platform. Java 24 (Oracle JDK 24) delivers thousands of improvements to help developers maximize productivity and drive innovation. In addition, enhancements to the platform's performance, stability, and security help organizations accelerate their business growth ...

Tigera Partners with Mirantis

March 18, 2025

Tigera announced an integration with Mirantis, creators of k0rdent, a new multi-cluster Kubernetes management solution.

SAP Introduces Joule for Developer

March 18, 2025

SAP announced “Joule for Developer” – new Joule AI co-pilot capabilities embedded directly within SAP Build.

SUSE Announces New Enterprise Linux Advancements

March 17, 2025

SUSE® announced several new enhancements to its core suite of Linux solutions.

Progress Releases Over 50 Free Enterprise-Grade UI Components for React Developers

March 13, 2025

Progress is offering over 50 enterprise-grade UI components from Progress® KendoReact™, a React UI library for business application development, for free.

Opsera Announces New Leadership Dashboard in Unified Insights

March 13, 2025

Opsera announced a new Leadership Dashboard capability within Opsera Unified Insights.

Cycloid Releases Components

March 13, 2025

Cycloid announced the introduction of Components, a new management layer enabling a modular, structured approach to managing cloud resources within the Cycloid engineering platform.

ServiceNow Announces Yokohama Release

March 12, 2025

ServiceNow unveiled the Yokohama platform release, including ServiceNow Studio which provides a unified workspace for rapid application development and governance.

Sonar Announces SonarQube Advanced Security

March 12, 2025

Sonar announced the upcoming availability of SonarQube Advanced Security.

ScaleOut Software Releases Version 4

March 12, 2025

ScaleOut Software introduces generative AI and machine-learning (ML) powered enhancements to its ScaleOut Digital Twins™ cloud service and on-premises hosting platform with the release of Version 4.

DEVOPSdigest

Keeping Up with Kubernetes Updates

Reduce or Eliminate Application/Cluster Downtime

GitOps for Application Life-Cycle Management

Comprehensive Observability

Summary

Industry News

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

Keeping Up with Kubernetes Updates

Reduce or Eliminate Application/Cluster Downtime

GitOps for Application Life-Cycle Management

Comprehensive Observability

Summary

Related Links

Industry News

Search form

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics