webAI and MacStadium(link is external) announced a strategic partnership that will revolutionize the deployment of large-scale artificial intelligence models using Apple's cutting-edge silicon technology.
The AI landscape continues to advance rapidly in early 2025. Developments such as the rise of agentic AI, open-source advancements, architectural innovations and cost management strategies are reshaping the operational paradigms faced by DevOps teams. To stay ahead in this fast-changing environment professionals should be keeping up with the latest trends and preparing for increased adoption of AI.
In this blog, I discuss six recent AI trends and conclude each section with suggestions to help DevOps teams navigate and excel in these rapidly evolving areas.
1: Rise of AI Agents
AI agents are evolving beyond simple scripted tasks to handle more complex workflows. According to Andrew Ng, effective agentic systems have the following fundamental components:
■ Reflection capabilities for self-correction and learning.
■ Tool use (API calling) for interaction with other systems.
■ Reasoning and planning to handle complex tasks.
■ Multi-agent collaboration for complex tasks and cooperative behavior.
I would add that agentic systems also require state/memory management for continuity and context-awareness.
Practical applications already exist and are being used by companies and their employees today, highlighting the versatility of Agentic AI in DevOps environments. For instance, ClickUp (a Georgian portfolio company) has integrated AI agents for task management, while Miro uses agents for documentation workflows. Additionally, FlowMind by JP Morgan automates financial tasks by using APIs to create and execute workflows through computer systems.
DevOps Team Suggestions:
■ Evaluate the need for agentic workflows and balance automation with determinism.
■ Set acceptable error thresholds and user interaction guidelines for AI agents.
■ Establish mechanisms for continuous evaluation and adaptation of AI agents.
2: Open-source Models are Gaining Ground with Closed-source Models
The performance gap between closed-source and open-source AI models continues to decrease. Recent releases like DeepSeek R1 and Mistral OCR claim to demonstrate comparable capabilities to proprietary models while offering significant cost advantages. The LLM menu offering is expanding and more open-source models are reaching parity with closed-source counterparts.
The rise of more customizable and cost-effective open-source LLMs presents DevOps teams with new challenges in development. Teams should carefully evaluate models that claim to be open-source. To date, the term "open-source" has been used to lump together vastly different openness practices across weights, datasets, licensing, and access methods. This "open-washing" requires thorough due diligence when planning deployments.
DevOps Team Suggestions:
■ Select models based on fit; balancing cost, performance, and speed.
■ Evaluate open-source models thoroughly, considering performance, cost, licensing, etc.
■ Continuously update your choices as new technologies and options arise.
3: Architecture Innovations
New architectures aim to address the limitations of transformers, such as their computational complexity and high memory usage. While traditional transformers remain dominant, newer attention-based architectures like Performer and Reformer are gaining traction, as are attention-free models like Mamba. Hybrid models that combine transformers with other types of models (e.g., Mamba) are also becoming popular. Hybrid models have seen some improvement over transformers. For example, AI21's hybrid Mamba-Transformer has seen inference speeds of up to 8x faster than 8B-parameter transformers.
Google DeepMind's Griffin, also a hybrid model, combines linear recurrence and local attention to match Llama-2 performance with 6x less training. These hybrid approaches suggest that the future may lie in architectures that blend different paradigms rather than purely novel approaches.
DevOps Team Suggestions:
■ Note that the AI community is pushing boundaries to overcome transformer limitations.
■ Consider new transformer architectures and transformer-free models.
■ Evaluate trade-offs between performance and resource usage.
4: Cost Management in AI Infrastructure
Analysis from a16z shows that AI inference costs are on trend to drop approximately 10x year-over-year without sacrificing performance. For instance, in November 2021, GPT-3 cost $60 per million tokens. In November 2024, Llama-3.2B cost just $0.06 per million tokens at the same level of performance as GPT-3 (MMLU >= 42), a 1000x drop in cost over three years.
Despite declining model costs, expenses may be rising in other areas. According to a November 2024 report by Georgian and NewtonX, more than 50% of organizations reported higher costs related to data storage and training/upskilling, while 40% cited increased total costs from AI implementation. As inference costs drop, organizations need to weigh savings against ongoing investments.
DevOps Team Suggestions:
■ Take a holistic view of costs beyond inference.
■ Carefully choose models to maximize business outcomes.
■ Adopt a forward-looking perspective on model and related costs.
5: Model Optimization Technologies
Recent advances in model pruning, quantization, fine-tuning, and distillation are making LLMs more efficient and accessible, driving further widespread adoption of generative AI.
Research from MIT and Meta demonstrates that up to 50% of layers in pre-trained LLMs can be pruned while maintaining most performance metrics, suggesting potential redundancy in current architectures.
Microsoft has developed BitNet, which explores quantization by reducing model weights to ternary values (-1, 0, 1), showing promising results while lowering memory requirements significantly.
In the fine-tuning domain, Apple's Unsloth reports acceleration of up to 30x through GPU kernel optimizations, potentially making model customization more accessible.
Model distillation is now used by Google and Anthropic, with open-source tools like DistillKit supporting its adoption. These techniques are reducing inference costs and boosting speed.
DevOps Team Suggestions:
■ Cost reductions in LLM customization may enable more specialized solutions.
■ Quantization and distillation methods are making LLMs more accessible and versatile.
■ Consider opting for pruned or distilled models which can reduce costs while maintaining performance.
6: Evaluation and Benchmarking Evolution
Dataset contamination remains a pressing concern in LLM evaluation. Researchers from Scale AI compared model performance between the standard GSM-8K math benchmark and their new GSM-1K benchmark, revealing performance discrepancies. Some models showed up to an 8% drop in accuracy when tested on the new math questions, suggesting overfitting and memorization. However, many advanced models (e.g., Gemini) can generalize to new math problems they haven't been trained on.
While numerous tools and platforms support evaluation, assessing non-deterministic AI workflows — particularly agentic ones — remains an open problem. This problem is being addressed with new evaluation approaches, tools, and platforms. Several startups in the LLM observability space are developing solutions to standardize AI system assessments, integrate with data management systems, provide standardized evaluation workflows, enhance security controls, and handle non-deterministic outputs.
DevOps Team Suggestions:
■ Implement diverse benchmarks for comprehensive evaluations.
■ Evaluating non-deterministic AI workflows, especially agentic types, presents challenges, consider how best to manage.
■ Emphasize human evaluation and leverage approaches like LLM-as-a-judge where necessary.
The AI infrastructure landscape is evolving across multiple dimensions, from agentic AI and open-source models achieving performance parity with closed-source models to inference costs declining. These trends and architectural innovations will continue to shape infrastructure decisions made in 2025 as organizations balance performance, cost and operational requirements.
Industry News
Development work on the Linux kernel — the core software that underpins the open source Linux operating system — has a new infrastructure partner in Akamai. The company's cloud computing service and content delivery network (CDN) will support kernel.org, the main distribution system for Linux kernel source code and the primary coordination vehicle for its global developer network.
Komodor announced a new approach to full-cycle drift management for Kubernetes, with new capabilities to automate the detection, investigation, and remediation of configuration drift—the gradual divergence of Kubernetes clusters from their intended state—helping organizations enforce consistency across large-scale, multi-cluster environments.
Red Hat announced the latest updates to Red Hat AI, its portfolio of products and services designed to help accelerate the development and deployment of AI solutions across the hybrid cloud.
CloudCasa by Catalogic announced the availability of the latest version of its CloudCasa software.
BrowserStack announced the launch of Private Devices, expanding its enterprise portfolio to address the specialized testing needs of organizations with stringent security requirements.
Chainguard announced Chainguard Libraries, a catalog of guarded language libraries for Java built securely from source on SLSA L2 infrastructure.
Cloudelligent attained Amazon Web Services (AWS) DevOps Competency status.
Platform9 formally launched the Platform9 Partner Program.
Cosmonic announced the launch of Cosmonic Control, a control plane for managing distributed applications across any cloud, any Kubernetes, any edge, or on premise and self-hosted deployment.
Oracle announced the general availability of Oracle Exadata Database Service on Exascale Infrastructure on Oracle Database@Azure(link sends e-mail).
Perforce Software announced its acquisition of Snowtrack.
Mirantis and Gcore announced an agreement to facilitate the deployment of artificial intelligence (AI) workloads.
Amplitude announced the rollout of Session Replay Everywhere.
Oracle announced the availability of Java 24, the latest version of the programming language and development platform. Java 24 (Oracle JDK 24) delivers thousands of improvements to help developers maximize productivity and drive innovation. In addition, enhancements to the platform's performance, stability, and security help organizations accelerate their business growth ...