Extreme-scale Scientific Software Stack Forum

Agenda

Monday, September 23, 2019 (Click for more information)

Abstract

Open source, community-developed reusable scientific software represents a large and growing body of capabilities. Linux distributions, vendor software stacks and individual disciplined software product teams provide the scientific computing community with usable holistic software environments containing core open source software components. At the same time, new software capabilities make it into these distributions in a largely ad hoc fashion.

The Extreme-scale Scientific Software Stack (E4S),first announced in November 2018, along with its community-organized scientific software development kits (SSDKs), is a new community effort to create lightweight cross-team coordination of scientific software development, delivery and deployment and a set of support tools an processes targeted at improving scientific software quality via improved practices, policy, testing and coordination.

E4S (https://e4s.io) is an open architecture effort, welcoming teams that are developing technically compatible and high-quality products to participate in the community. E4S and the SSDKs are sponsored by the US Department of Energy Exascale Computing Project (ECP), driven by our need to effectively develop, test, deliver and deploy our open source software products to the scientific community.

In this presentation, we introduce E4S, discuss its design and implementation goals and show examples of success and challenges so far. We will also discuss our connection with other key community efforts we rely upon for our success, in particular the Spack software management ecosystem, the OpenHPC software distribution project and vendor efforts.

Slides

Bio

Michael Heroux is a Senior Scientist at Sandia National Laboratories, Director of SW Technologies for the US DOE Exascale Computing Project (ECP) and Scientist in Residence at St. John’s University, MN. His research interests include all aspects of scalable scientific and engineering software for new and emerging parallel computing architectures.

He leads several projects in this field: ECP SW Technologies is an integrated effort to provide the software stack for ECP. The Trilinos Project (2004 R&D 100 winner) is an effort to provide reusable, scalable scientific software components. The Mantevo Project (2013 R&D 100 winner) is focused on the development of open source, portable mini-applications and mini-drivers for the co-design of future supercomputers and applications. HPCG is an official TOP 500 benchmark for ranking computer systems, complementing LINPACK.

Abstract

As the code complexity of HPC applications expand, development teams continually rely upon detailed software operation workflows to enable automation of building and testing their application. These development workflows can become increasingly complex and, as a result, difficult to maintain when the target platforms' environments are increasing in architectural diversity and continually changing. Recently the advent of containers in industry have demonstrated the feasibility of such workflows, and the latest support for containers in HPC environments makes them now attainable for application teams. Fundamentally, containers have the potential to provide a mechanism for simplifying workflows for development and deployment, which could improve overall build and testing efficiency for many teams.

This talk introduces the ECP Supercomputing Containers Project, named Supercontainers, which represents a consolidated effort across the DOE and NNSA to use a multi-level approach to accelerate adoption of container technologies for Exascale. A major tenant of the project is to ensure that container runtimes are well poised to take advantage of future HPC systems, including efforts to ensure container images can be scalable, interoperable, and well integrated into Exascale supercomputing across the DOE. The project focuses on foundational system software research needed for ensuring containers can be deployed at scale and provides enhanced user and developer support for enabling containerized Exascale applications and software are both efficient and performant. Furthermore, these activities are conducted in the context of interoperability, effectively generating portable solutions that work for HPC applications across DOE facilities ranging from laptops to Exascale platforms.

Slides

Bio

Andrew J. Younge is a Computer Scientist in the Scalable System Software department at Sandia National Laboratories. He currently serves as the Lead PI for the Supercontainers project under the DOE Exascale Computing Project and is a key contributor to the Astra system, the world's first supercomputer based on the Arm processor deployed under Sandia's Vanguard program. Prior to joining Sandia, Andrew held visiting positions at the MITRE Corporation, the University of Southern California's Information Sciences Institute, and the University of Maryland, College Park. He received his PhD in computer science from Indiana University in 2016 and his BS and MS in computer science from the Rochester Institute of Technology in 2008 and 2010 respectively. His research interests include high performance computing, virtualization, distributed systems, and energy efficient computing. The focus of his research is on improving the usability and efficiency of system software for supercomputing systems.

Abstract

OpenHPC provides a comprehensive software stack needed to install and run HPC machines. It also provides a gathering point for people, both practitioners and researchers, working on HPC software. In this talk I will describe the goals of OpenHPC, and provide status of the community, including directions it is currently undertaking. I will highlight areas where the work OpenHPC is doing is complimentary to the DOE's ECP E4S effort and suggest areas where we could work closer together. I will also describe on-going open source efforts within Intel that we intend to submit for inclusion into OpenHPC that will allow it to be more effective on capability-class supercomputers.

Slides

Bio

Dr. Robert W. Wisniewski is an ACM Distinguished Scientist, IEEE Senior Member, and the Chief Software Architect for Extreme Scale Computing and a Senior Principal Engineer at Intel Corporation. He is the lead architect and PI for A21, the supercomputer targeted to be the first exascale machine in the US when delivered in 2021. He is also the lead architect for Intel's cohesive and comprehensive software stack that was used to seed OpenHPC, and serves on the OpenHPC governance board as chairman. He has published over 74 papers in the area of high performance computing, computer systems, and system performance, filed over 56 patents, and given over 60 external invited presentations. Before coming to Intel, he was the chief software architect for Blue Gene Research and manager of the Blue Gene and Exascale Research Software Team at the IBM T.J. Watson Research Facility, where he was an IBM Master Inventor and led the software effort on Blue Gene/Q, which was the most powerful computer in the world in June 2012, and occupied 4 of the top 10 positions on Top 500 list.

Abstract

The Exascale Computing Project (ECP), a joint project between the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, is responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. In this talk, I will give an overview of the activities and progress in the ECP Software Ecosystem and Delivery activity. Ultimately, we seek to support a sustainable, high-quality software ecosystem that is continuously improved by a robust research and development effort, is deployed on advanced computing platforms, and is broadly adopted by application teams and software developers to accelerate their science. In the short term, a major goal of the Software Ecosystem and Delivery team is to ensure that exascale-ready applications will have necessary, production-quality software technology products available on DOE exascale systems from day one. To accomplish this, we have development efforts in: Spack for turn-key, from-source builds of popular and HPC software packages; container technologies to deliver scalable, performant, and interoperable container images and runtimes; and delivery of the Extreme-scale Scientific Software Stack (E4S, http://e4s.io).

Slides

Bio

Dr. Todd Munson is a Senior Computational Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory, where he currently manages the Software Ecosystem and Delivery portfolio in the Exascale Computing Project and is the numerical optimization area lead for the FASTMath SciDAC Institute and a developer of the Toolkit for Advanced Optimization. He received a Presidential Early Career Award for Scientists and Engineers in 2006 for his "pioneering work in the development of algorithms, software, and problem-solving environments for the solution of large-scale optimization problems."

Abstract

The Hartree Centre is transforming UK industry through High Performance Computing, Big Data and AI. As such the Hartree Centre performs transformative research and development addressing key industrial, societal and scientific challenges. Backed by over £170 million of government funding and significant strategic partnerships with organisations such as IBM, Atos, Intel and recent MoU with Turing Institute, the Hartree Centre is home to some of the most technically advanced high performance computing, data analytics and machine learning technologies in the UK and is applying these to diverse applications with high industrial, societal and scientific impact. Our approach has been to develop scalable mathematical methods and algorithms and through advanced Research Software Engineering approaches ensure scalability at all levels of the stack starting from mathematical and algorithmic level, through programming models and environments level and down to systems level. The talk gives examples of the work being undertaken at the Hartree Centre, applying scalable mathematical methods and algorithms approaches to industry, societal and scientific challenge-led projects.

Slides

Bio

Vassil Alexandrov was appointed in 2019 as Chief Science Officer at Hartree Centre- STFC. Previously he was an ICREA Research Professor in Computational Science and Extreme Computing group leader at Barcelona Supercomputung Center. He has before that held positions at the University of Liverpool, UK, the University of Reading, UK as the Director of the ACET Centre and a Professor in Computational Science, School of Systems Engineering, and Monterrey Institute of Technology and Higher Education (ITESM), Mexico as a Distinguished Visiting Professor (Jan 2015-Jan 2018) . He holds an MSc degree in Applied Mathematics from Moscow State University, and a PhD degree from Bulgarian Academy of Sciences.

His research interests are in the area of scalable mathematical methods and algorithms with focus on extreme scale computing and methods and algorithms for discovering global properties on data. He has significant experience in stochastic modelling, stochastic methods (Monte Carlo methods and algorithms, etc.) and hybrid stochastic/deterministic methods and algorithms.

Abstract

SDSC systems support users with a wide range of HPC applications, often with varying software library and versions requirements. SDSC staff also provide containerized (Singularity) solutions for users with complex software requirements. In preparation for future deployments, testing is underway to evaluate Spack for package deployment and software versions management, and the performance of HPC applications/libraries available as part of E4S. Preliminary results and feedback from installation/testing experiences will be presented. Current (Comet, TSCC) and future (Expanse) SDSC systems are aimed at providing cyberinfrastructure for the long tail of science and supporting a large (25000+) number of users. The ongoing evaluation will consider the ease of use, and impact of using Spack and E4S.

Slides

Bio

Mahidhar Tatineni received his M.S. & Ph.D. in Aerospace Engineering from UCLA. He is currently the director of the user services group at SDSC. He has led the deployment and support of high performance computing and data applications software on several NSF and UC resources including Comet, and Gordon at SDSC. He has worked on many NSF funded optimization and parallelization research projects such as petascale computing for magnetosphere simulations, MPI performance tuning frameworks, hybrid programming models, topology aware communication and scheduling, big data middleware, and application performance evaluation using next generation communication mechanisms for emerging HPC systems.

Abstract

The Kokkos C++ Performance Portability Ecosystem is a production-level solution for writing modern C++ applications in an hardware-agnostic way. It is part of the US Department of Energy's Exascale Computing Projectthe leading effort in the US to prepare the HPC community for the next generation of supercomputing platforms. It is now used by more than a hundred HPC projects, and Kokkos-based codes are running regularly at-scale on half of the top ten supercomputers in the world. In this talk, we will provide a short overview of what the Kokkos Ecosystem provides, including its programming model, math kernels library, tools, and training resources, before diving deeper into the capabilities of the programming model. The presentation will introduce the core abstractions of Kokkos and how they are critical components of writing performance-portable code. The primary elements of its API will be presented with code examples to provide an idea of the look and feel of typical Kokkos code. Performance results of a few selected apps comparing different architectures will be shown as evidence that true performance portability can be achieved. Last but not least, the talk will provide an overview of the Kokkos team’s efforts surrounding the ISO-C++ standard, and how Kokkos both influences future standards and aligns with developments occurring in it.

Slides

Bio

Christian Trott is a high performance computing expert with extensive experience designing and implementing software for modern HPC systems. He is a principal member of staff at Sandia National Laboratories, where he leads the Kokkos core team developing the performance portability programming model for C++ and heads Sandia's delegation to the ISO C++ standards committee. He also serves as adviser to numerous application teams, helping them redesign their codes using Kokkos and achieve performance portability for the next generation of supercomputers. Christian is a regular contributor to numerous scientific software projects including LAMMPS and Trilinos. He earned a doctorate from the University of Technology Ilmenau in theoretical physics with a focus on computational material research.

Abstract

With the development of increasingly complex architectures and software due to multiphysics modeling, and the coupling of simulations and data analytics, applications increasingly require the combined use of software packages developed by diverse, independent teams throughout the HPC community. The Extreme-scale Scientiﬁc Software Development Kit (xSDK) is being developed to provide such an infrastructure of independent mathematical libraries to support rapid and efficient development of high-quality applications. This presentation will introduce the xSDK, its history and discuss in more detail the development and impact of the xSDK community policies, which were defined to achieve improved code quality and compatibility across xSDK member packages and constitute an integral part of the xSDK.

Slides

Bio

Ulrike Meier Yang leads the Mathematical Algorithms & Computing Group in the Center for Applied Scientific Computing of Lawrence Livermore National Laboratory. She leads the xSDK project in DOE's Exascale Computing Project and the Linear Solvers topical area in the SciDAC FASTMath Institute; she is a developer of the software library hypre. She earned her Ph.D. in computer science from the University of Illinois at Urbana-Champaign. Her research interests are numerical algorithms, parallel computing, and scientific software design.

Abstract

Today, xSDK provides a collection of 17+ scientific library packages aimed at efficient and effective application development, and thus enables a coupling of multi-physics and multi-component applications with ease. The entire collection was achieved by the contributions of the individual library developers, who went through the daunting code refactoring and testing to ensure the capability of all the libraries built together. Given the increasing diversity of parallel execution models and application needs, this process is expected to be more challenging to cover the variety of application needs.

In this talk, we will discuss these challenges by reviewing the lessons from the past activities of the xSDK project. These lessons shed light onto the software engineering practice issues of the HPC community as well as the technical limitation of the current parallel programming models and runtime systems.

Slides

Bio

Keita Teranishi received the BS and MS degrees from the University of Tennessee, Knoxville, in 1998 and 2000, respectively, and the PhD degree from Penn State University, in 2004. He is principal member of technical staff with Sandia National Laboratories. His research interests are parallel programming model, resilience, sparse matrix and tensor computation for high performance computing platforms.

Abstract

The architectures of future exascale supercomputers will render the current approach of saving to storage the full simulation state obsolete, resulting in disruptive changes to the scientific workflow. A major concern is that exascale system concurrency, and thus its ability to create massive scientific results, is expected to grow by five or six orders of magnitude, yet storage bandwidth and capacity are only expected to grow by one and two orders of magnitude.

With the arrival of the Nation’s first exascale systems, the inevitable worsening storage bottleneck will make it necessary for most simulation data to be analyzed in situ, or on the supercomputer while the simulation is running. Furthermore, to meet data bandwidth constraints, it will be necessary to sharply reduce the volume of data moved on the machine and especially the data that are exported to persistent storage. The combination of sharp data reduction and new analysis approaches heighten the importance of storing data in standard formats to support validation of results and post-hoc data analysis and visualization.

Given these constraints, the Data Management and Visualization team is focused on the delivery of efficient storage, checkpoint restart, compression, and in situ visualization and analysis software. In this talk, I will give an overview of Data management and Visualization projects and examples of their use in ECP applications.

Slides

Bio

Dr. James Ahrens is a senior scientist in the Applied Computer Science Group at Los Alamos National Laboratory. His primary research interests are visualization, computer graphics, data science and parallel systems. Ahrens is author of over 100 peer reviewed papers and the founder/design lead of ParaView, an open-source visualization tool designed to handle extremely large data. ParaView is broadly used for scientific visualization, downloaded approximately a quarter of a million times per year, and is in use at supercomputing and scientific centers worldwide. Dr. Ahrens has extensive management experience as a technical program manager. He has over twenty awards as a principal or co-investigator from the U.S. Department of Energy and the U.S. National Science Foundation. These awards have evolved in scope over the course of his career to multi-million dollar, interdisciplinary, data analysis/visualization projects involving multiple partners from academia, laboratories and industry. Ahrens is currently the U.S. Exascale Computing Project’s Data and Visualization lead for a collection of storage, data management and visualization projects that will be a key part of a vibrant exascale supercomputing application and software ecosystem. Dr. Ahrens received his B.S. in Computer Science from the University of Massachusetts at Amherst in 1989 and a Ph.D. in Computer Science from the University of Washington in 1996.

Abstract

In this talk we’ll look at a subset of the use cases of the LLVM Compiler Infrastructure being used within ECP. Specifically, we’ll look at making the compiler explicitly aware of parallel constructs by extending the intermediate representation and discuss the advantages and the potential of using such representations. In addition, we’ll take a look at Flang – a new Fortran front-end for LLVM community and discuss why it might actually become the most modern compiler component within in the LLVM suite over the next couple of years.

Slides

Bio

Patrick (Pat) McCormick is a senior computer scientist at Los Alamos National Laboratory (LANL) with over 25 years of experience, and is known for his work in programming models, early GPGPU programming, data visualization, and parallel systems. He currently serves as the Programming Models team leader at Los Alamos, and is a PI for three ECP projects. His research interests include programming languages and models, compilers, runtime systems, and their application on emerging and heterogenous architectures. In the past he served as the Deputy Director of Software Technology for the Exascale Computing Project.

Abstract

This talk will focus on challenges in designing, developing, packaging and deploying high-performance and scalable MPI and HPC Cloud middleware for HPC clusters. We will discuss the designs, sample performance numbers and best practices of using the MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu) on modern HPC clusters while considering the support for multi-core systems (x86, ARM and OpenPOWER), high-performance networks (InfiniBand, Omni-Path, RoCE, AWS-EFA, and iWARP), GPGPUs (including GPUDirect RDMA), and energy-awareness. We will also discuss the experiences in packaging MVAPICH libraries using RPM and Spack environments.

Slides

Bio

Dr. Hari Subramoni received the Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data and cloud computing. He has published over 70 papers in international journals and conferences related to these research areas. Recently, Dr. Subramoni is doing research and working on the design and development of MVAPICH2, MVAPICH2-GDR, and MVAPICH2-X software packages. He is a member of IEEE. More details about Dr. Subramoni are available from http://www.cse.ohio-state.edu/~subramon.