Red Hat, US DOE Team For HPC and Cloud Environments
June 1, 2022
Red Hat is collaborating with multiple U.S. Department of Energy (DOE)
laboratories to bolster cloud-native standards and practices in
high-performance computing (HPC), including Lawrence Berkeley National
Laboratory, Lawrence Livermore National Laboratory, and Sandia National
Laboratories.
Adoption of HPC is expanding beyond traditional use cases. Advancements
in artificial intelligence, machine learning and deep learning, as well
as compute and data-driven analytics, is driving greater interest and
need for organizations to be able to run scalable containerized
workloads on traditional HPC infrastructure. According to industry
analyst firm Hyperion Research, roughly one-third of all HPC system
revenue will be dedicated to AI-centric systems by 2025, showing nearly
23% CAGR over the five year period1, driven by the influx of AI
workloads. Additionally, nearly 20% of HPC users' HPC-enabled AI
workloads are currently being run in the cloud.
Red Hat is a leader in cloud-native innovation across hybrid and
multicloud environments, while laboratories understand the needs and
unique demands of massive-scale HPC deployments. By establishing a
common foundation of technology best practices, this collaboration seeks
to use standardized container platforms to link HPC and cloud computing
footprints, helping to fill potential gaps in building cloud-friendly
HPC applications while creating common usage patterns for industry,
enterprise and HPC deployments.
Together with the laboratories, Red Hat will focus on advancing four
specific areas that address current gaps and help lay the groundwork for
exascale computing, including standardization, scale, cloud-native
application development, and container storage. Examples of
collaborative projects between Red Hat and DOE laboratories includes:
Bringing standard container technologies to HPC
Red Hat and the National Energy Research Scientific Computing Center (NERSC)
at Berkeley Lab recognize the importance of standard-based solutions in
enabling computing innovation, especially when technologies must span
from the edge to the cloud to HPC environments. From container security
to scaling containerized workloads, common, accepted practices help HPC
sites to get the most from container technologies. To better meet the
unique requirements for large scale HPC systems and pave the way for
organizations to be able to take advantage of containers in exascale
computing, Red Hat and NERSC are collaborating on enhancements to Podman,
a daemonless container engine for developing, managing and running
container images on a Linux system, to enable it to replace NERSC’s
custom development runtime, Shifter.
Running Kubernetes at massive scale
Red Hat has been collaborating with Sandia National Laboratories on the
SuperContainers project for several years, working to make Linux
containers and other building blocks of cloud-native computing more
readily accessible to supercomputing operations. In this expanded
collaboration, Red Hat and Sandia National Laboratories intend to
explore the deployment scenarios of Kubernetes-based infrastructure at
extreme scale, providing easier, well-defined mechanisms for delivering
containerized workloads to users.
Bridging traditional HPC jobs with cloud-native workloads
Red Hat and Lawrence Livermore National Laboratory are collaborating to
bring HPC job schedulers, such as Flux, to Kubernetes through a
standardized programmatic interface helping IT teams supporting
supercomputing operations to better manage traditional parallel
workflows alongside containerized jobs, including how this mix of
technologies operates with low-level hardware devices, like accelerators
or high-speed networks.
Reimagining storage for containers
For containers to be used effectively across both HPC and commercial
cloud resources, a set of standard interfaces is needed in order to
manage various container image formats and for providing access to
distributed file systems. Red Hat and the three DOE National
Laboratories aim to define the mechanisms by which container images can
be migrated from and deployed with other container engines, allowing
users to freely move their applications across popular container runtime
platforms, as well as create mechanisms that allow containers to use
distributed file systems as persistent storage.
Through the collaboration and Red Hat's experience supporting some of
the most powerful supercomputers in the world, HPC sites will be able to
abstract the immense complexities their environments can present,
benefiting the range of United States exascale machines being deployed
by DOE.
Chris Wright, senior vice president and chief technology officer, Red
Hat
“The HPC community has served as the proving ground for
compute-intensive applications, embracing containers early on to help
deal with a new set of scientific challenges and problems. That led to
the lack of standardization across various HPC sites creating barriers
to building and deploying containerized applications that can
effectively span large-scale HPC, commercial and cloud environments,
while also taking advantage of emerging hardware accelerators. Through
our collaboration with leading laboratories, we are working to remove
these barriers, opening the door to liberating next-generation HPC
workloads.”
Earl Joseph, Ph.D., chief executive officer, Hyperion Research
“High performance computing infrastructure must adapt to the
requirements of today's heterogeneous workloads, including workloads
that use containers. Red Hat’s partnership with the DOE labs is designed
to allow the new generation of HPC applications to run in containers at
exascale while utilizing distributed file system storage, providing a
strong example of collaboration between industry and research leaders."
Shane Canon, senior engineer, Lawrence Berkeley National Laboratory
“The collaboration with the Podman community and Red Hat engineers is
helping us to explore and co-develop enhancements that will allow Podman
to scale and perform for the largest HPC workloads. We have already
demonstrated this across 512 GPU nodes on Perlmutter. NERSC sees a
convergence of HPC and cloud-native workloads, and Podman can be an
important tool in helping to bridge between these two worlds.”
Bronis R. de Supinski, chief technology officer, Lawrence Livermore
National Laboratory
“High
performance computing infrastructure is becoming more diverse and is
increasingly being used to run non-traditional HPC workflows. We need to
provide mechanisms for scheduling various types of workflows and expect
container orchestration frameworks like Kubernetes and Red Hat OpenShift
to be a significant part of the software ecosystem effectively
contributing to the convergence of the HPC and cloud realms.”
Andrew J. Younge, Ph.D., R&D manager and computer scientist, Sandia
National Laboratories
“Sandia and the DOE are seeing an increased need to support more diverse
HPC workloads, beyond traditional batch-based modeling and simulation
codes. This requires us to find new and innovative ways to enabling
services, tasks, and data persistence models together within tight
coordination with current simulation capabilities. Furthermore, workload
portability remains an important consideration where containers are now
a key component to our code deployment strategy. Sandia’s collaboration
with Red Hat on Podman and Kubernetes-based OpenShift enables us to
investigate approaches for delivering modeling and simulation
capabilities as a service to Sandia’s designer and analyst communities.” |