AWS Parallel Computing Service Released
September 04, 2024

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, announced the general availability of AWS Parallel Computing Service, a new managed service that helps customers easily set up and manage high performance computing (HPC) clusters so they can run scientific and engineering workloads at virtually any scale on AWS.

The service makes it easy for system administrators to build clusters using Amazon Elastic Compute Cloud (Amazon EC2) instances, low-latency networking, and storage optimized for HPC workloads. With AWS Parallel Computing Service, scientists and engineers can quickly scale simulations to validate models and designs, while system administrators and integrators can build and maintain HPC clusters on AWS using Slurm, the most popular open-source HPC workload manager. This service accelerates innovation in areas such as fast-tracking drug discovery, uncovering genomic insights, building engineering designs, running weather applications, and building scientific and engineering models.

AWS has a history of innovation in supporting HPC workloads. That history includes releases like the open source cluster orchestration toolkit AWS ParallelCluster, fully managed batch computing service AWS Batch, low latency network interconnect Elastic Fabric Adapter, Amazon FSx for Lustre high performance storage, and dedicated AMD, Intel, and Graviton-based HPC compute instances, the latter delivering up to 65% better price-performance over comparable compute optimized x86-based instances. Thousands of customers from a wide range of industries have migrated their HPC workloads to AWS to fast-track drug discovery, uncover genomic insights, maximize energy resources, and spin up supercomputers with millions of cores. Today AWS continues our innovation in HPC by releasing a fully-managed and comprehensive HPC service, which removes the undifferentiated heavy lifting of creating and managing HPC clusters.

AWS Parallel Computing Service is a new managed service that helps customers easily set up and manage HPC so they can run scientific and engineering workloads at virtually any scale on AWS. With AWS Parallel Computing Service, system administrators can use familiar tools including AWS Management Console, CLI, and SDK to deploy a managed Slurm environment. AWS Parallel Computing Service builds from open-source foundations that customers know and have experience with, and delivers a managed Slurm experience with the reliability and availability of AWS. AWS Parallel Computing Service significantly reduces the operational burden of managing a cluster and regularly delivers new capabilities and fixes through managed service updates with minimal to no downtime, eliminating the need to apply manual patches and rebuilding clusters to receive feature updates. Highly available APIs also help developers and ISVs create end-to-end HPC solutions on top of AWS, so they can focus on providing value-added features to their users and customers instead of worrying about managing infrastructure. AWS Parallel Computing Service enables customers of all sizes (e.g., startups, enterprises, or national labs) to easily create and manage HPC clusters with the scalability, reliability, and security of AWS. This means scientists and engineers using Slurm can easily migrate their existing on-premises workflows to AWS without re-architecting them—giving scientists and engineers access to cloud infrastructure that scales automatically. And administrators who want to unblock capacity or capability constraints for their end-users can spin up clusters in just minutes instead of months, to run their simulations to address the world’s most challenging problems.

“Developing a cure for a catastrophic disease, designing novel materials, advancing renewable energy, and revolutionizing transportation are problems that we just can’t afford to have waiting in a queue,” said Ian Colle, director, advanced compute and simulation at AWS. “Managing HPC workloads, particularly the most complex and challenging extreme-scale workloads, is extraordinarily difficult. Our aim is that every scientist and engineer using AWS Parallel Computing Service, regardless of organization size, is the most productive person in their field because they have the same top-tier HPC capabilities as large enterprises to solve the world’s toughest challenges, any time they need to, and at any scale.”

To get started, system administrators use the AWS Management Console to spin up a Slurm cluster securely and execute jobs in just a few clicks, compared to manual orchestration today. With CloudFormation support coming soon, customers will be able to build and deploy HPC clusters using infrastructure as code. AWS Parallel Computing Service is now available in the following Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Europe (Stockholm), Europe (Ireland), Asia Pacific (Sydney), Asia Pacific (Singapore), Asia Pacific (Tokyo).

Marvel Fusion is a Germany-based fusion energy startup pursuing the creation of unlimited zero-emission energy. “We are excited that AWS Parallel Computing Service will deliver highly available and easy-to-upgrade HPC cluster management capabilities,” said Moritz von der Linden, CEO of Marvel Fusion. “It will empower our scientists and IT staff to take advantage of the latest AWS Parallel Computing Service capabilities in hours, instead of the weeks of planning and overhead previously needed.”

Maxar Intelligence provides secure, precise geospatial intelligence, enabling government and commercial customers to monitor, understand, and navigate our changing planet. “As a long-time user of AWS HPC solutions, we were excited to test the service-driven approach from AWS Parallel Computing Service,” said Travis Hartman, director of Weather and Climate at Maxar Intelligence. “We found great potential for AWS Parallel Computing Service to bring better cluster visibility, compute provisioning, and service integration to Maxar Intelligence’s WeatherDesk platform, which would enable the team to make their time-sensitive HPC clusters more resilient and easier to manage.”

RONIN is an Australia-based software company whose flagship HPC service provides a simple, intuitive web interface for researchers and scientists from leading academic and research institutions to easily run HPC simulations on AWS. "Democratizing HPC in the cloud by simplifying the user experience for researchers is our key mission," said Nathan Albrighton, CEO and founder of RONIN. "The introduction of AWS Parallel Computing Service greatly simplifies our ability to build and operate HPC environments using APIs and elevates the HPC capabilities we offer to our customers."

The U.S. Department of Energy’s National Renewable Energy Laboratory (NREL) is a leading institution focused on research, innovation, and strategic partnerships to deliver solutions for a clean energy economy. "The pursuit of scientific discovery comes with significant overhead associated with maintaining high performance computing infrastructure," said Michael Bartlett, cloud architect in the Advance Computing Operations Group at NREL. "AWS Parallel Computing Service has the potential to improve our research efficiency by reducing this overhead with its automated update and observability management features. In particular, new capabilities for automatic scaling and handling high-throughput computing tasks will allow us to efficiently process large datasets and complex simulations, ensuring that our scientists can prioritize solving high-priority problems."

Share this

Industry News

September 18, 2024

MacStadium announced the General Availability of Orka Desktop 3.0, a powerful, user-friendly tool that allows developers, testers, and macOS admins to create, test, and manage macOS virtual machines (VMs) on local Apple silicon-based computers.

September 18, 2024

Komodor announced Klaudia, a Generative AI (GenAI) agent for troubleshooting and remediating operational issues, as well as optimizing Kubernetes environments.

September 18, 2024

Inflectra announced the launch of Rapise v8, a test automation solution that uses the power of Generative AI to deliver true autonomous testing.

September 17, 2024

Check Point® Software Technologies Ltd. has been recognized as one of theWorld’s Best Companies of 2024 by TIME and Statista.

Check Point made its debut on the list due to its strong employee satisfaction, revenue growth, and ESG efforts.

September 17, 2024

Oracle announced the availability of Java 23, the latest version of the programming language and development platform.

September 17, 2024

JFrog announced a new product integration with NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform.

September 17, 2024

Tigera announced several new features for Calico Cloud and Calico Enterprise to improve the efficiency of remediating vulnerabilities in container images, and ensure compatibility with the latest deployment options for OpenShift.

September 17, 2024

Gearset announced the acquisition of Clayton, a code analysis platform designed specifically for Salesforce.

September 16, 2024

Docker is introducing a new way for developers and organizations to access its suite of products – including Docker Desktop, Docker Hub, Docker Trusted Content, Docker Scout, Docker Build Cloud, and Testcontainers Cloud.

September 16, 2024

The Linux Foundation, the nonprofit organization enabling mass innovation through open source, announced the launch of the OpenSearch Software Foundation, a community-driven initiative that will support OpenSearch and its search software, which is used by developers around the world to build search, analytics, observability, and vector database applications.

September 16, 2024

Copado announced the Copado AI platform encompassing a suite of AI-powered DevOps agents.

September 16, 2024

Kong announced the release of Kong Gateway 3.8, a major update that sets a new standard for API management.

September 16, 2024

Perforce Software announced that its mobile application testing platform, Perfecto, will support Apple's latest iOS version, iOS 18, on Monday, September 16, 2024.

September 12, 2024

Check Point® Software Technologies Ltd. has been recognized as a Leader in the latest GigaOm Radar Report for Security Policy as Code.

September 12, 2024

JFrog announced the addition of JFrog Runtime to its suite of security capabilities, empowering enterprises to seamlessly integrate security into every step of the development process, from writing source code to deploying binaries into production.