Rafay Debuts New Platform Capabilities

November 19, 2024

Rafay Systems announced new platform advancements that help enterprises and GPU cloud providers deliver developer-friendly consumption workflows for GPU infrastructure.

The new Rafay Platform capabilities include enterprise-grade controls, SKU definition, customer-specific policy enforcement and granular chargeback data. Enterprises investing in GPU-based infrastructure in data centers can leverage the Rafay Platform to roll out feature-rich enterprise-wide GPU clouds that developers and data scientists can consume on demand — complete with workbenches for model training, fine-tuning and inferencing. GPU cloud providers deploying GPUs for consumption by downstream customers can leverage the Rafay Platform to operate a full-featured, multi-tenant GPU PaaS that delivers both accelerated computing resources along with AI and ML tooling for training, tuning and serving large language models (LLMs).

“Our work with customers across high-stakes industries over the last two quarters has revealed that enterprises and GPU cloud providers are running into similar challenges. Both are looking for ways to speed up the delivery of accelerated computing hardware to developers and data scientists,” said Haseeb Budhani, CEO and co-founder of Rafay Systems. “The new Rafay Platform capabilities address this need, helping enterprises and GPU cloud providers speed the delivery of a PaaS experience in order to monetize their significant investments in accelerated computing infrastructure.”

With Rafay, GPU cloud providers and enterprises can quickly launch production-ready AI services. Platform teams can now deliver much-needed services to developers and data scientists through a PaaS offering that enables self-service consumption of compute as well as AI and ML workbenches for fast experimentation and productization of AI-based applications.

Newly added Rafay Platform capabilities include:

- Multi-tenancy enforcement: Rafay implements robust multi-tenancy controls that allow GPU cloud providers and enterprises to safely and securely deploy workloads from multiple customers on the same infrastructure without the risk of lateral escalation attacks. The Rafay Platform offers new controls to protect against lateral escalation, including a Kubernetes admission controller that will automatically wrap pods into isolated Kata containers, each of which operate inside a microVM inside a virtual Kubernetes cluster. Additionally, the platform also supports dynamic network policy definition, zero-trust access management and role-based access control (RBAC). Collectively, these controls ensure demonstrable isolation between tenants, allowing for better monetization of expensive infrastructure.

- Programmatic SKUs: For both GPU cloud and enterprise platform teams, Rafay allows programmatic definition of compute and service profiles that can be offered to developers and data scientists as a turnkey package, empowering them to focus on building generative AI apps instead of worrying about the infrastructure. By enabling the dynamic definition of self-service packages — programmatic SKUs — GPU cloud and enterprise customers can better manage infrastructure consumption and ensure high utilization based on customer needs.

- With Rafay, customers can programmatically package compute resources and AI applications to deliver Small, Medium or Large offerings that end users can select based on their needs and an associated price. For example, Small may be defined as a Jupyter Notebook environment pre-set with a PyTorch environment that is tied to one NVIDIA H100 GPU and is priced at $3 per hour. Medium may be defined as a fine-tuning workbench pre-configured with the Llama 3.1 model and tied to eight NVIDIA H100 GPUs, and priced at $20 per hour. This approach replaces hardcoded SKU definition strategies with a solution that scales, helping GPU cloud providers package their offerings to meet market needs, while giving enterprises control over resource consumption.

- Purpose-built AI workbenches: With Rafay’s service profile capabilities, platform teams can provide Rafay’s native fine-tuning and inferencing tools or third-party services, such as NVIDIA NIMs and Run:AI, to create AI workbenches for developers and data scientists. These workbenches come pre-configured with all necessary components to speed the delivery of specialized environments for AI and ML workflows. Platform teams can optionally attach these workbenches to SKUs for self-service consumption.

- Chargeback and billing: Rafay provides detailed resource tracking and cost attribution features to help GPU cloud providers and enterprises monitor consumption across their user base. GPU cloud providers can leverage chargeback data to generate billing information for customers. Enterprises can leverage chargeback data to internally manage budgets and cost center attribution.

The new platform capabilities are now generally available to customers in the Rafay Platform.

Industry News

ServiceNow Announces Yokohama Release

March 12, 2025

ServiceNow unveiled the Yokohama platform release, including ServiceNow Studio which provides a unified workspace for rapid application development and governance.

Sonar Announces SonarQube Advanced Security

March 12, 2025

Sonar announced the upcoming availability of SonarQube Advanced Security.

ScaleOut Software Releases Version 4

March 12, 2025

ScaleOut Software introduces generative AI and machine-learning (ML) powered enhancements to its ScaleOut Digital Twins™ cloud service and on-premises hosting platform with the release of Version 4.

Next Generation of Kurrent Cloud Introduced

March 11, 2025

Kurrent unveiled a developer-centric evolution of Kurrent Cloud that transforms how developers and dev teams build, deploy and scale event-native applications and services.

ArmorCode Announces ServiceNow Vulnerability Response Integration and Apps Now Available in Store

March 11, 2025

ArmorCode announced the launch of two new apps in the ServiceNow Store.

Parasoft Expedites Support for New MISRA C:2025 Compliance Standard, Reinforcing Commitment to Advance Safety-Critical Software Development

March 10, 2025

Parasoft(link is external) is accelerating the release of its C/C++test 2025.1 solution, following the just-published MISRA C:2025 coding standard.

GitHub Announces Secret Protection and Code Security

March 10, 2025

GitHub is making GitHub Advanced Security (GHAS) more accessible for developers and teams of all sizes.

ArmorCode Enhances Global Partner Program

March 10, 2025

ArmorCode announced the enhanced ArmorCode Partner Program, highlighting its goal to achieve a 100 percent channel-first sales model.

Parasoft Adds New GenAI Innovation, Streamlines Compliance and Bolsters Support for C++ Developers of Safety-Critical, Security-Focused Applications

March 06, 2025

Parasoft(link is external) is showcasing its latest product innovations at embedded world Exhibition, booth 4-318(link is external), including new GenAI integration with Microsoft Visual Studio Code (VS Code) to optimize test automation of safety-critical applications while reducing development time, cost, and risk.

JFrog Integrates with NVIDIA NIM Microservices

March 06, 2025

JFrog announced general availability of its integration with NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform.

CloudCasa by Catalogic Introduces SUSE Rancher Prime Extension

March 06, 2025

CloudCasa by Catalogic announce an integration with SUSE® Rancher Prime via a new Rancher Prime Extension.

MacStadium Orka Cluster 3.2 Now Available on AWS and On-Prem

March 05, 2025

MacStadium(link is external) announced the extended availability of Orka(link is external) Cluster 3.2, establishing the market’s first enterprise-grade macOS virtualization solution available across multiple deployment options.

JFrog Integrates with Hugging Face

March 05, 2025

JFrog is partnering with Hugging Face, host of a repository of public machine learning (ML) models — the Hugging Face Hub — designed to achieve more robust security scans and analysis forevery ML model in their library.

Copado Announces DevOps Automation Agent on Salesforce AgentExchange

March 05, 2025

Copado launched DevOps Automation Agent on Salesforce's AgentExchange, a global ecosystem marketplace powered by AppExchange for leading partners building new third-party agents and agent actions for Agentforce.

Harness and Traceable Complete Merger

March 05, 2025

Harness completed its merger with Traceable, effective March 4, 2025.

DEVOPSdigest

Industry News

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

Industry News

Search form

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics