AWS Parallel Computing Service Released
September 04, 2024

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, announced the general availability of AWS Parallel Computing Service, a new managed service that helps customers easily set up and manage high performance computing (HPC) clusters so they can run scientific and engineering workloads at virtually any scale on AWS.

The service makes it easy for system administrators to build clusters using Amazon Elastic Compute Cloud (Amazon EC2) instances, low-latency networking, and storage optimized for HPC workloads. With AWS Parallel Computing Service, scientists and engineers can quickly scale simulations to validate models and designs, while system administrators and integrators can build and maintain HPC clusters on AWS using Slurm, the most popular open-source HPC workload manager. This service accelerates innovation in areas such as fast-tracking drug discovery, uncovering genomic insights, building engineering designs, running weather applications, and building scientific and engineering models.

AWS has a history of innovation in supporting HPC workloads. That history includes releases like the open source cluster orchestration toolkit AWS ParallelCluster, fully managed batch computing service AWS Batch, low latency network interconnect Elastic Fabric Adapter, Amazon FSx for Lustre high performance storage, and dedicated AMD, Intel, and Graviton-based HPC compute instances, the latter delivering up to 65% better price-performance over comparable compute optimized x86-based instances. Thousands of customers from a wide range of industries have migrated their HPC workloads to AWS to fast-track drug discovery, uncover genomic insights, maximize energy resources, and spin up supercomputers with millions of cores. Today AWS continues our innovation in HPC by releasing a fully-managed and comprehensive HPC service, which removes the undifferentiated heavy lifting of creating and managing HPC clusters.

AWS Parallel Computing Service is a new managed service that helps customers easily set up and manage HPC so they can run scientific and engineering workloads at virtually any scale on AWS. With AWS Parallel Computing Service, system administrators can use familiar tools including AWS Management Console, CLI, and SDK to deploy a managed Slurm environment. AWS Parallel Computing Service builds from open-source foundations that customers know and have experience with, and delivers a managed Slurm experience with the reliability and availability of AWS. AWS Parallel Computing Service significantly reduces the operational burden of managing a cluster and regularly delivers new capabilities and fixes through managed service updates with minimal to no downtime, eliminating the need to apply manual patches and rebuilding clusters to receive feature updates. Highly available APIs also help developers and ISVs create end-to-end HPC solutions on top of AWS, so they can focus on providing value-added features to their users and customers instead of worrying about managing infrastructure. AWS Parallel Computing Service enables customers of all sizes (e.g., startups, enterprises, or national labs) to easily create and manage HPC clusters with the scalability, reliability, and security of AWS. This means scientists and engineers using Slurm can easily migrate their existing on-premises workflows to AWS without re-architecting them—giving scientists and engineers access to cloud infrastructure that scales automatically. And administrators who want to unblock capacity or capability constraints for their end-users can spin up clusters in just minutes instead of months, to run their simulations to address the world’s most challenging problems.

“Developing a cure for a catastrophic disease, designing novel materials, advancing renewable energy, and revolutionizing transportation are problems that we just can’t afford to have waiting in a queue,” said Ian Colle, director, advanced compute and simulation at AWS. “Managing HPC workloads, particularly the most complex and challenging extreme-scale workloads, is extraordinarily difficult. Our aim is that every scientist and engineer using AWS Parallel Computing Service, regardless of organization size, is the most productive person in their field because they have the same top-tier HPC capabilities as large enterprises to solve the world’s toughest challenges, any time they need to, and at any scale.”

To get started, system administrators use the AWS Management Console to spin up a Slurm cluster securely and execute jobs in just a few clicks, compared to manual orchestration today. With CloudFormation support coming soon, customers will be able to build and deploy HPC clusters using infrastructure as code. AWS Parallel Computing Service is now available in the following Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Europe (Stockholm), Europe (Ireland), Asia Pacific (Sydney), Asia Pacific (Singapore), Asia Pacific (Tokyo).

Marvel Fusion is a Germany-based fusion energy startup pursuing the creation of unlimited zero-emission energy. “We are excited that AWS Parallel Computing Service will deliver highly available and easy-to-upgrade HPC cluster management capabilities,” said Moritz von der Linden, CEO of Marvel Fusion. “It will empower our scientists and IT staff to take advantage of the latest AWS Parallel Computing Service capabilities in hours, instead of the weeks of planning and overhead previously needed.”

Maxar Intelligence provides secure, precise geospatial intelligence, enabling government and commercial customers to monitor, understand, and navigate our changing planet. “As a long-time user of AWS HPC solutions, we were excited to test the service-driven approach from AWS Parallel Computing Service,” said Travis Hartman, director of Weather and Climate at Maxar Intelligence. “We found great potential for AWS Parallel Computing Service to bring better cluster visibility, compute provisioning, and service integration to Maxar Intelligence’s WeatherDesk platform, which would enable the team to make their time-sensitive HPC clusters more resilient and easier to manage.”

RONIN is an Australia-based software company whose flagship HPC service provides a simple, intuitive web interface for researchers and scientists from leading academic and research institutions to easily run HPC simulations on AWS. "Democratizing HPC in the cloud by simplifying the user experience for researchers is our key mission," said Nathan Albrighton, CEO and founder of RONIN. "The introduction of AWS Parallel Computing Service greatly simplifies our ability to build and operate HPC environments using APIs and elevates the HPC capabilities we offer to our customers."

The U.S. Department of Energy’s National Renewable Energy Laboratory (NREL) is a leading institution focused on research, innovation, and strategic partnerships to deliver solutions for a clean energy economy. "The pursuit of scientific discovery comes with significant overhead associated with maintaining high performance computing infrastructure," said Michael Bartlett, cloud architect in the Advance Computing Operations Group at NREL. "AWS Parallel Computing Service has the potential to improve our research efficiency by reducing this overhead with its automated update and observability management features. In particular, new capabilities for automatic scaling and handling high-throughput computing tasks will allow us to efficiently process large datasets and complex simulations, ensuring that our scientists can prioritize solving high-priority problems."

Share this

Industry News

September 12, 2024

Check Point® Software Technologies Ltd. has been recognized as a Leader in the latest GigaOm Radar Report for Security Policy as Code.

September 12, 2024

JFrog announced the addition of JFrog Runtime to its suite of security capabilities, empowering enterprises to seamlessly integrate security into every step of the development process, from writing source code to deploying binaries into production.

September 12, 2024

Kong unveiled its new Premium Technology Partner Program, a strategic initiative designed to deepen its engagement with technology partners and foster innovation within its cloud and developer ecosystem.

September 11, 2024

Kong announced the launch of the latest version of Kong Konnect, the API platform for the AI era.

September 10, 2024

Oracle announced new capabilities to help customers accelerate the development of applications and deployment on Oracle Cloud Infrastructure (OCI).

September 10, 2024

JFrog and GitHub unveiled new integrations.

September 10, 2024

Opsera announced its latest platform capabilities for Salesforce DevOps.

September 09, 2024

Progress announced it has entered into a definitive agreement to acquire ShareFile, a business unit of Cloud Software Group, providing SaaS-native, AI-powered, document-centric collaboration, focusing on industry segments including business and professional services, financial services, healthcare and construction.

September 05, 2024

Red Hat announced the general availability of Red Hat Enterprise Linux (RHEL) AI across the hybrid cloud.

September 05, 2024

Jitterbit announced its unified AI-infused, low-code Harmony platform.

September 05, 2024

Akuity announced the launch of KubeVision, a feature within the Akuity Platform.

September 05, 2024

Couchbase announced Capella Free Tier, a free developer environment designed to empower developers to evaluate and explore products and test new features without time constraints.

September 04, 2024

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, announced the general availability of AWS Parallel Computing Service, a new managed service that helps customers easily set up and manage high performance computing (HPC) clusters so they can run scientific and engineering workloads at virtually any scale on AWS.

September 04, 2024

Dell Technologies and Red Hat are bringing Red Hat Enterprise Linux AI (RHEL AI), a foundation model platform built on an AI-optimized operating system that enables users to more seamlessly develop, test and deploy artificial intelligence (AI) and generative AI (gen AI) models, to Dell PowerEdge servers.