U.S. DEPARTMENT OF ENERGY’S ARGONNE LEADERSHIP COMPUTING FACILITY (ALCF) AND HPE EXPAND HIGH-PERFORMANCE COMPUTING (HPC) STORAGE CAPACITY FOR EXASCALE

ALCF advances capabilities to target complex scientific research using modeling, simulation, and AI, ahead of its upcoming Aurora exascale supercomputer

Hewlett Packard Enterprise (HPE) and the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility, today announced that ALCF will deploy the new Cray ClusterStor E1000, the most efficient parallel storage solution, as its newest storage system. The new collaboration supports ALCF’s scientific research in areas such as earthquake seismic activity, aerospace turbulence and shock-waves, phys ical genomics and more. The latest deployment advances storage capacity for ALCF’s workloads that require converged modeling, simulation, artificial intelligence (AI) and analytics workloads, in preparation for Aurora, ALCF’s forthcoming exascale supercomputer, powered by HPE and Intel, and the first-of-its-kind expected to be delivered in the U.S. in 2021.

The Cray ClusterStor E1000 system utilizes purpose-built software and hardware features to meet high-performance storage requirements of any size with significantly fewer drives. Designed to support the Exascale Era, which is characterized by the explosion of data and converged workloads, the Cray ClusterStor E1000 will power ALCF’s future Aurora supercomputer to target a multitude of data-intensive workloads required to make breakthrough discoveries at unprecedented speed.

“ALCF is committed to creating new experiences with Exascale Era technologies by deploying infrastructure required for converged workloads in modeling, simulation, AI and analytics,” said Peter Ungaro, senior vice president and general manager, HPC and AI, at HPE. “Our recent introduction of the Cray ClusterStor E1000 is delivering ALCF unmatched scalability and performance to meet next-generation HPC storage needs to support emerging, data-intensive workloads. We look forward to continuing our collaboration with ALCF and empower its research community to unlock new value.”

ALCF’s two new storage systems, which it has named “Grand” and “Eagle,” are using the Cray ClusterStor E1000 system to gain a completely new, cost-effective high-performance computing (HPC) storage solution to effectively and efficiently manage growing converged workloads that today’s offerings cannot support.

“When Grand launches, it will benefit ALCF’s legacy petascale machines, providing increased capacity for the Theta compute system and enabling new levels of performance for not just traditional checkpoint-restart workloads, but also for complex workflows and metadata-intensive work,” said Mark Fahey, director of operations, ALCF.

“Eagle will help support the ever-increasing importance of data in the day-to-day activities of science,” said Michael E. Papka, director, ALCF. “By leveraging our experience with our current data-sharing system, Petrel, this new storage will help eliminate barriers to productivity and improve collaborations throughout the research community.”

The two new systems will gain a total of 200 petabyes (PB) of storage capacity, and through the Cray ClusterStor E1000’s intelligent software and hardware designs, will more accurately align data flows with target workloads. ALCF’s Grand and Eagle systems will help researchers accelerate a range of scientific discoveries across disciplines, and are each assigned to address the following:

Computational capacity – ALCF’s “Grand” provides 150 PB of center-wide storage and new levels of input/output (I/O) performance to support massive computational needs for its users.

Simplified data-sharing – ALCF’s “Eagle” provides a 50 PB community file system to make data-sharing easier than ever among ALCF users, their collaborators and with third parties.

ALCF plans to deliver its Grand and Eagle storage systems in early 2020. The systems will initially connect to existing ALCF supercomputers powered by HPE HPC systems: Theta, based on the Cray® XC40-AC™ and Cooley, based on the Cray CS-300. ALCF’s Grand, which is capable of 1 terabyte per second (TB/s) bandwidth, will be optimized to support converged simulation science and data-intensive workloads once the Aurora exascale supercomputer is operational.

Leave a Reply

Your email address will not be published. Required fields are marked *