{"title":"Corrigendum to large-scale direct numerical simulations of turbulence using GPUs and modern Fortran","authors":"","doi":"10.1177/10943420231173573","DOIUrl":"https://doi.org/10.1177/10943420231173573","url":null,"abstract":"","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergio Iserte, Alejandro González-Barberá, Paloma Barreda, K. Rojek
{"title":"A study on the performance of distributed training of data-driven CFD simulations","authors":"Sergio Iserte, Alejandro González-Barberá, Paloma Barreda, K. Rojek","doi":"10.1177/10943420231160557","DOIUrl":"https://doi.org/10.1177/10943420231160557","url":null,"abstract":"Data-driven methods for computer simulations are blooming in many scientific areas. The traditional approach to simulating physical behaviors relies on solving partial differential equations (PDEs). Since calculating these iterative equations is highly both computationally demanding and time-consuming, data-driven methods leverage artificial intelligence (AI) techniques to alleviate that workload. Data-driven methods have to be trained in advance to provide their subsequent fast predictions; however, the cost of the training stage is non-negligible. This article presents a predictive model for inferencing future states of a specific fluid simulation that serves as a use case for evaluating different training alternatives. Particularly, this study compares the performance of only CPU, multi-GPU, and distributed approaches for training a time series forecasting deep learning model. With some slight code adaptations, results show and compare, in different implementations, the benefits of distributed GPU-enabled training for predicting high-accuracy states in a fraction of the time needed by the computational fluid dynamics solver.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"503 - 515"},"PeriodicalIF":3.1,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49621368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naweiluo Zhou, G. Scorzelli, Jakob Luettgau, R. Kancharla, Joshua J. Kane, Robert Wheeler, B. Croom, P. Newell, Valerio Pascucci, M. Taufer
{"title":"Orchestration of materials science workflows for heterogeneous resources at large scale","authors":"Naweiluo Zhou, G. Scorzelli, Jakob Luettgau, R. Kancharla, Joshua J. Kane, Robert Wheeler, B. Croom, P. Newell, Valerio Pascucci, M. Taufer","doi":"10.1177/10943420231167800","DOIUrl":"https://doi.org/10.1177/10943420231167800","url":null,"abstract":"In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"260 - 271"},"PeriodicalIF":3.1,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49196158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess
{"title":"Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows","authors":"S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess","doi":"10.1177/10943420231167811","DOIUrl":"https://doi.org/10.1177/10943420231167811","url":null,"abstract":"Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"288 - 305"},"PeriodicalIF":3.1,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45220506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance Computing","authors":"S. Bhowmick, Patrick Bell, M. Taufer","doi":"10.1177/10943420231166610","DOIUrl":"https://doi.org/10.1177/10943420231166610","url":null,"abstract":"The convergence of extremely high levels of hardware concurrency and the effective overlap of computation and communication in asynchronous executions has resulted in increasing nondeterminism in High-Performance Computing (HPC) applications. Nondeterminism can manifest at multiple levels: from low-level communication primitives to libraries to application-level functions. No matter its source, nondeterminism can drastically increase the cost of result reproducibility, debugging workflows, testing parallel programs, or ensuring fault-tolerance. Nondeterministic executions of HPC applications can be modeled as event graphs, and the applications’ nondeterministic behavior can be understood and, in some cases, mitigated using graph comparison algorithms. However, a connection between graph comparison algorithms and approaches to understanding nondeterminism in HPC still needs to be established. This survey article moves the first steps toward establishing a connection between graph comparison algorithms and nondeterminism in HPC with its three contributions: it provides a survey of different graph comparison algorithms and a timeline for each category’s significant works; it discusses how existing graph comparison methods do not fully support properties needed to understand nondeterministic patterns in HPC applications; and it presents the open challenges that should be addressed to leverage the power of graph comparisons for the study of nondeterminism in HPC applications.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"306 - 327"},"PeriodicalIF":3.1,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47768873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering","authors":"P. Luszczek, Wissam M. Sid-Lakhdar, J. Dongarra","doi":"10.1177/10943420231166365","DOIUrl":"https://doi.org/10.1177/10943420231166365","url":null,"abstract":"We combine deep Gaussian processes (DGPs) with multitask and transfer learning for the performance modeling and optimization of HPC applications. Deep Gaussian processes merge the uncertainty quantification advantage of Gaussian processes (GPs) with the predictive power of deep learning. Multitask and transfer learning allow for improved learning efficiency when several similar tasks are to be learned simultaneously and when previous learned models are sought to help in the learning of new tasks, respectively. A comparison with state-of-the-art autotuners shows the advantage of our approach on two application problems. In this article, we combine DGPs with multitask and transfer learning to allow for both an improved tuning of an application parameters on problems of interest but also the prediction of parameters on any potential problem the application might encounter.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"229 - 244"},"PeriodicalIF":3.1,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46811754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatizing the creation of specialized high-performance computing containers","authors":"J. Ejarque, Rosa M. Badia","doi":"10.1177/10943420231165729","DOIUrl":"https://doi.org/10.1177/10943420231165729","url":null,"abstract":"With Exascale computing already here, supercomputers are systems every time larger, more complex, and heterogeneous. While expert system administrators can install and deploy applications in the systems correctly, this is something that general users can not usually do. The eFlows4HPC project aims to provide methodologies and tools to enable the use and reuse of application workflows. One of the aspects that the project focuses on is simplifying the application deployment in large and complex systems. The approach uses containers, not generic ones, but containers tailored for each target High-Performance Computing (HPC) system. This paper presents the Container Image Creation service developed in the framework of the project and experimentation based on project applications. We compare the performance of the specialized containers against generic containers and against a native installation. The results show that in almost all cases, the specialized containers outperform the generic ones (up to 2× faster), and in all cases, the performance is the same as with the native installation.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"272 - 287"},"PeriodicalIF":3.1,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41409061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating cluster dynamics simulation of fission gas behavior in nuclear fuel on deep computing unit–based heterogeneous architecture supercomputer","authors":"He Bai, Changjun Hu, Yuhan Zhu, Dandan Chen, Genshen Chu, Shuai Ren","doi":"10.1177/10943420231162831","DOIUrl":"https://doi.org/10.1177/10943420231162831","url":null,"abstract":"High fidelity simulation of fission gas behavior is able to help us understand and predict the performance of nuclear fuel under different irradiation conditions. Cluster dynamics (CD) is a mesoscale simulation method which is rapidly developed in nuclear fuel research area in recent years, and it can effectively describe the microdynamic behavior of fission gas in nuclear fuel; however, due to the huge cost of computation needed for CD model solution, the application scenario of CD has been limited. Thus, how to design the acceleration algorithm for the given computing resources to improve the computing efficiency and simulation scale has become a key problem of CD simulation. In this work, we present an accelerating cluster dynamics model based on the spatially dependent cluster dynamics model, combined with multi optimization methods on a DCU (deep computing unit)-based heterogeneous architecture supercomputer. The correctness of the model is verified by comparing with experimental data and Xolotl—a software of SciDAC program from the U.S. Department of Energy’s Office of Science. Furthermore, our model implementation has a better computing performance than Xolotl’s GPU version. Our code has gained great strong/weak scaling performance with more than 72.75%/84.07% parallel efficiency on 1024 compute nodes. This work developed a new efficient model for CD simulation of fission gas in nuclear fuel.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"516 - 529"},"PeriodicalIF":3.1,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47260453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Rodríguez-Sánchez, Adrián Castelló, Sandra Catalán, Francisco D. Igual, E. S. Quintana‐Ortí
{"title":"Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors","authors":"Rafael Rodríguez-Sánchez, Adrián Castelló, Sandra Catalán, Francisco D. Igual, E. S. Quintana‐Ortí","doi":"10.1177/10943420231157653","DOIUrl":"https://doi.org/10.1177/10943420231157653","url":null,"abstract":"Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":" ","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45507162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Matsuoka, Jens Domke, M. Wahib, Aleksandr Drozd, T. Hoefler
{"title":"Myths and legends in high-performance computing","authors":"S. Matsuoka, Jens Domke, M. Wahib, Aleksandr Drozd, T. Hoefler","doi":"10.1177/10943420231166608","DOIUrl":"https://doi.org/10.1177/10943420231166608","url":null,"abstract":"In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We gathered these myths from conversations at conferences and meetings, product advertisements, papers, and other communications such as tweets, blogs, and news articles within and beyond our community. We believe they represent the zeitgeist of the current era of massive change, driven by the end of many scaling laws such as Dennard scaling and Moore’s law. While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research. Nevertheless, these myths are rarely based on scientific facts, but rather on some evidence or argumentation. In fact, we believe that this is the very reason for the existence of many myths and why they cannot be answered clearly. While it feels like there should be clear answers for each, some may remain endless philosophical debates, such as whether Beethoven was better than Mozart. We would like to see our collection of myths as a discussion of possible new directions for research and industry investment.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"245 - 259"},"PeriodicalIF":3.1,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42989305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}