J. P. Dahm;D. F. Richards;A. Black;A. D. Bertsch;L. Grinberg;I. Karlin;S. Kokkila-Schumacher;E. A. León;J. R. Neely;R. Pankajakshan;O. Pearce
{"title":"Sierra Center of Excellence: Lessons learned","authors":"J. P. Dahm;D. F. Richards;A. Black;A. D. Bertsch;L. Grinberg;I. Karlin;S. Kokkila-Schumacher;E. A. León;J. R. Neely;R. Pankajakshan;O. Pearce","doi":"10.1147/JRD.2019.2961069","DOIUrl":"https://doi.org/10.1147/JRD.2019.2961069","url":null,"abstract":"The introduction of heterogeneous computing via GPUs from the Sierra architecture represented a significant shift in direction for computational science at Lawrence Livermore National Laboratory (LLNL), and therefore required significant preparation. Over the last five years, the Sierra Center of Excellence (CoE) has brought employees with specific expertise from IBM and NVIDIA together with LLNL in a concentrated effort to prepare applications, system software, and tools for the Sierra supercomputer. This article shares the process we applied for the CoE and documents lessons learned during the collaboration, with the hope that others will be able to learn from both our success and intermediate setbacks. We describe what we have found to work for the management of such a collaboration and best practices for algorithms and source code, system configuration and software stack, tools, and application performance.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2961069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformation of application enablement tools on CORAL systems","authors":"S. Maerean;E. K. Lee;H.-F. Wen;I-H. Chung","doi":"10.1147/JRD.2019.2960246","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960246","url":null,"abstract":"The CORAL project exhibits an important shift in the computational paradigm from homogeneous to heterogeneous computing, where applications run on both the CPU and the accelerator (e.g., GPU). Existing applications optimized to run only on the CPU have to be rewritten to adopt accelerators and retuned to achieve optimal performance. The shift in the computational paradigm requires application development tools (e.g., compilers, performance profilers and tracers, and debuggers) change to better assist users. The CORAL project places a strong emphasis on open-source tools to create a collaborative environment in the tools community. In this article, we discuss the collaboration efforts and corresponding challenges to meet the CORAL requirements on tools and detail three of the challenges that required the most involvement. A usage scenario is provided to show how the tools may help users adopt the new computation environment and understand their application execution and the data flow at scale.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960246","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Code: Developers tackle natural disasters with software","authors":"D. Krook;S. Malaika","doi":"10.1147/JRD.2019.2960241","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960241","url":null,"abstract":"Natural disasters are increasing as highlighted in many reports including the Borgen Project. In 2018, David Clark Cause as creator and IBM as founding partner, in partnership with the United Nations Human Rights Office, the American Red Cross International Team, and The Linux Foundation, issued a “Call for Code” to developers to create robust projects that prepare communities for natural disasters and help them respond more quickly in their aftermath. This article covers the steps and tools used to engage with developers, the results from the first of five competitions to be run by the Call for Code Global Initiative over five years, and how the winners were selected. Insights from the mobilization of 100,000 developers toward this cause are described, as well as the lessons learned from running large-scale hackathons.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. E. Curzon;P. Curotto;M. Evason;A. Failla;P. Kusterer;A. Ogawa;J. Paraszczak;S. Raghavan
{"title":"A unique approach to corporate disaster philanthropy focused on delivering technology and expertise","authors":"R. E. Curzon;P. Curotto;M. Evason;A. Failla;P. Kusterer;A. Ogawa;J. Paraszczak;S. Raghavan","doi":"10.1147/JRD.2019.2960244","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960244","url":null,"abstract":"The role of corporations and their corporate social responsibility (CSR)-related response to disasters in support of their communities has not been extensively documented; thus, this article attempts to explain the role that one corporation, IBM, has played in disaster response and how it has used IBM and open-source technologies to deal with a broad range of disasters. These technologies range from advanced seismic monitoring and flood management to predicting and improving refugee flows. The article outlines various principles that have guided IBM in shaping its disaster response and provides some insights into various sources of useful data and applications that can be used in these critical situations. It also details one example of an emerging technology that is being used in these efforts.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49986743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The CORAL supercomputer systems","authors":"W. A. Hanson","doi":"10.1147/JRD.2019.2960220","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960220","url":null,"abstract":"In 2014, the U.S. Department of Energy (DoE) initiated a multiyear collaboration between Oak Ridge National Laboratory (ORNL), Argonne National Laboratory, and Lawrence Livermore National Laboratory (LLNL), known as “CORAL,” the next major phase in the DoE's scientific computing roadmap. The IBM CORAL systems are based on a fundamentally new data-centric architecture, where compute power is embedded everywhere data resides, combining powerful central processing units (CPUs) with graphics processing units (GPUs) optimized for scientific computing and artificial intelligence workloads. The IBM CORAL systems were built on the combination of mature technologies: 9th-generation POWER CPU, 6th-generation NVIDIA GPU, and 5th-generation Mellanox InfiniBand. These systems are providing scientists with computing power to solve challenges in many research areas beyond previously possible. This article provides an overview of the system solutions deployed at ORNL and LLNL.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49978542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Porting a 3D seismic modeling code (SW4) to CORAL machines","authors":"R. Pankajakshan;P.-H. Lin;B. Sjögreen","doi":"10.1147/JRD.2019.2960218","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960218","url":null,"abstract":"Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses the porting of SW4 to run on the CORAL architecture using the RAJA performance portability abstraction layer. The performances of key kernels using RAJA and CUDA are compared to estimate the performance penalty of using the portability abstraction layer. Code changes required for efficiency on GPUs and minimizing time spent in Message Passing Interface (MPI) are discussed. This article describes a path for efficiently porting large code bases to GPU-based machines while avoiding the pitfalls of a new architecture in the early stages of its deployment. Current bottlenecks in the code are discussed along with possible architectural or software mitigations. SW4 runs 28× faster on one 4-GPU CORAL node than on a CTS-1 node (Dual Intel Xeon E5-2695 v4). SW4 is now in routine use on problems of unprecedented resolution (203 billion grid points) and scale on 1,200 nodes of Summit.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960218","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. E. Eichenberger;G.-T. Bercea;A. Bataev;L. Grinberg;J. K. O'Brien
{"title":"Hybrid CPU/GPU tasks optimized for concurrency in OpenMP","authors":"A. E. Eichenberger;G.-T. Bercea;A. Bataev;L. Grinberg;J. K. O'Brien","doi":"10.1147/JRD.2019.2960245","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960245","url":null,"abstract":"Sierra and Summit supercomputers exhibit a significant amount of intranode parallelism between the host POWER9 CPUs and their attached GPU devices. In this article, we show that exploiting device-level parallelism is key to achieving high performance by reducing overheads typically associated with CPU and GPU task execution. Moreover, manually exploiting this type of parallelism in large-scale applications is nontrivial and error-prone. We hide the complexity of exploiting this hybrid intranode parallelism using the OpenMP programming model abstraction. The implementation leverages the semantics of OpenMP tasks to express asynchronous task computations and their associated dependences. Launching tasks on the CPU threads requires a careful design of work-stealing algorithms to provide efficient load balancing among CPU threads. We propose a novel algorithm that removes locks from all task queueing operations that are on the critical path. Tasks assigned to GPU devices require additional steps such as copying input data to GPU devices, launching the computation kernels, and copying data back to the host CPU memory. We perform key optimizations to reduce the cost of these additional steps by tightly integrating data transfers and GPU computations into streams of asynchronous GPU operations. We further map high-level dependences between GPU tasks to the same asynchronous GPU streams to further avoid unnecessary synchronization. Results validate our approach.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative modeling in disaster management: A literature review","authors":"A. E. Baxter;H. E. Wilborn Lagerman;P. Keskinocak","doi":"10.1147/JRD.2019.2960356","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960356","url":null,"abstract":"The number, magnitude, complexity, and impact of natural disasters have been steadily increasing in various parts of the world. When preparing for, responding to, and recovering from a disaster, multiple organizations make decisions and take actions considering the needs, available resources, and priorities of the affected communities, emergency supply chains, and infrastructures. Most of the prior research focuses on decision-making for independent systems (e.g., single critical infrastructure networks or distinct relief resources). An emerging research area extends the focus to interdependent systems (i.e., multiple dependent networks or resources). In this article, we survey the literature on modeling approaches for disaster management problems on independent systems, discuss some recent work on problems involving demand, resource, and/or network interdependencies, and offer future research directions to add to this growing research area.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49980046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit","authors":"M. Coletti;A. Fafard;D. Page","doi":"10.1147/JRD.2019.2960225","DOIUrl":"https://doi.org/10.1147/JRD.2019.2960225","url":null,"abstract":"Architectural and hyperparameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyperparameters before discovering that distorted training data were impeding the training progress. We found that an evolutionary algorithm (EA) can be used to troubleshoot this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that contrast limited adaptive histogram equalization enhancement that was applied to previously generated digital surface models, for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL intersection over unions still exhibited consistent subpar performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement and allowed for a more timely convergence on an ultimately viable solution.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2960225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Summit and Sierra supercomputer cooling solutions","authors":"S. Tian;T. Takken;V. Mahaney;C. Marroquin;M. Schultz;M. Hoffmeyer;Y. Yao;K. O'Connell;A. Yuksel;P. Coteus","doi":"10.1147/JRD.2019.2958902","DOIUrl":"https://doi.org/10.1147/JRD.2019.2958902","url":null,"abstract":"Achieving optimal data center cooling efficiency requires effective water cooling of high-heat-density components, coupled with optimal warmer water temperatures and the correct order of water preheating from any air-cooled components. The Summit and Sierra supercomputers implemented efficient cooling by using high-performance cold plates to directly water-cool all central processing units (CPUs) and graphics processing units (GPUs) processors with warm inlet water. Cost performance was maximized by directly air-cooling the 10% to 15% of the compute drawer heat load generated by the lowest heat density components. For the Summit system, a rear-door heat exchanger allowed zero net heat load to air; the overall system efficiency was optimized by using the preheated water from the heat exchanger as an input to cool the higher power CPUs and GPUs.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2958902","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49948804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}