A. Vargas, T. Stitt, K. Weiss, V. Tomov, Jean-Sylvain Camier, T. Kolev, R. Rieben
{"title":"Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA","authors":"A. Vargas, T. Stitt, K. Weiss, V. Tomov, Jean-Sylvain Camier, T. Kolev, R. Rieben","doi":"10.1177/10943420221100262","DOIUrl":"https://doi.org/10.1177/10943420221100262","url":null,"abstract":"With the introduction of advanced heterogeneous computing architectures based on GPU accelerators, large-scale production codes have had to rethink their numerical algorithms and incorporate new programming models and memory management strategies in order to run efficiently on the latest supercomputers. In this work we discuss our co-design strategy to address these challenges and achieve performance and portability with MARBL, a next-generation multi-physics code in development at Lawrence Livermore National Laboratory. We present a two-fold approach, wherein new hardware is used to motivate both new algorithms and new abstraction layers, resulting in a single source application code suitable for a variety of platforms. Focusing on MARBL’s ALE hydrodynamics package, we demonstrate scalability on different platforms and highlight that many of our innovations have been contributed back to open-source software libraries, such as MFEM (finite element algorithms) and RAJA (kernel abstractions).","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"36 1","pages":"492 - 509"},"PeriodicalIF":3.1,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42558031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew E. Blanchard, John P. Gounley, D. Bhowmik, Mayanka Chandra Shekar, Isaac Lyngaas, Shang Gao, Junqi Yin, A. Tsaris, Feiyi Wang, J. Glaser
{"title":"Language models for the prediction of SARS-CoV-2 inhibitors","authors":"Andrew E. Blanchard, John P. Gounley, D. Bhowmik, Mayanka Chandra Shekar, Isaac Lyngaas, Shang Gao, Junqi Yin, A. Tsaris, Feiyi Wang, J. Glaser","doi":"10.1101/2021.12.10.471928","DOIUrl":"https://doi.org/10.1101/2021.12.10.471928","url":null,"abstract":"The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"51 1","pages":"587 - 602"},"PeriodicalIF":3.1,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62337418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pascal R Bähr, B. Lang, P. Ueberholz, M. Ady, R. Kersevan
{"title":"Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs","authors":"Pascal R Bähr, B. Lang, P. Ueberholz, M. Ady, R. Kersevan","doi":"10.1177/10943420211056654","DOIUrl":"https://doi.org/10.1177/10943420211056654","url":null,"abstract":"Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based simulation unit for Nvidia RTX Graphics Processing Units (GPUs). The GPU simulation kernel was designed with Nvidia’s OptiX 7 API to make use of modern hardware-accelerated ray-tracing units, found in recent RTX series GPUs based on the Turing and Ampere architectures. Even with the challenges posed by switching to 32 bit computations, our kernel runs much faster than on comparable CPUs at the expense of a marginal drop in calculation precision.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"36 1","pages":"141 - 152"},"PeriodicalIF":3.1,"publicationDate":"2021-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42921743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Co-design in the Exascale Computing Project","authors":"T. Germann","doi":"10.1177/10943420211059380","DOIUrl":"https://doi.org/10.1177/10943420211059380","url":null,"abstract":"We provide an overview of the six co-design centers within the U.S. Department of Energy’s Exascale Computing Project, each of which is described in more detail in a separate paper in this special issue. We also give a perspective on the evolution of computational co-design.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"503 - 507"},"PeriodicalIF":3.1,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43633066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-Lagrangian 4d, 5d, and 6d kinetic plasma simulation on large-scale GPU-equipped supercomputers","authors":"L. Einkemmer, A. Moriggl","doi":"10.1177/10943420221137599","DOIUrl":"https://doi.org/10.1177/10943420221137599","url":null,"abstract":"Running kinetic plasma physics simulations using grid-based solvers is very demanding both in terms of memory as well as computational cost. This is primarily due to the up to six-dimensional phase space and the associated unfavorable scaling of the computational cost as a function of grid spacing (often termed the curse of dimensionality). In this article, we present 4d, 5d, and 6d simulations of the Vlasov–Poisson equation with a split-step semi-Lagrangian discontinuous Galerkin scheme on graphic processing units (GPUs). The local communication pattern of this method allows an efficient implementation on large-scale GPU-based systems and emphasizes the importance of considering algorithmic and high-performance computing aspects in unison. We demonstrate a single node performance above 2 TB/s effective memory bandwidth (on a node with four A100 GPUs) and show excellent scaling (parallel efficiency between 30% and 67%) for up to 1536 A100 GPUs on JUWELS Booster. Graphical Abstract","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"180 - 196"},"PeriodicalIF":3.1,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48720975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAM++: Porting the E3SM-MMF cloud resolving model using a C++ portability library","authors":"Isaac Lyngaas, M. Norman, Youngsung Kim","doi":"10.1177/10943420211044495","DOIUrl":"https://doi.org/10.1177/10943420211044495","url":null,"abstract":"In this work, we demonstrate the process for porting the cloud resolving model (CRM) used in the Energy Exascale Earth System Model Multi-Scale Modeling Framework (E3SM-MMF) from its original Fortran code base to C++ code using a portability library. This porting process is performed using the Yet Another Kernel Library (YAKL), a simplified C++ portability library that specializes in Fortran porting. In particular, we detail our step-by-step approach for porting the System for Atmospheric Modeling (SAM), the CRM used in E3SM-MMF, using a hybrid Fortran/C++ framework that allows for systematic reproduction and correctness testing of gradually ported YAKL C++ code. Additionally, analysis is done on the performance of the ported code using OLCF’s Summit supercomputer.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"36 1","pages":"214 - 230"},"PeriodicalIF":3.1,"publicationDate":"2021-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49062634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seher Acer, A. Azad, E. Boman, A. Buluç, K. Devine, SM Ferdous, Nitin Gawande, Sayan Ghosh, M. Halappanavar, A. Kalyanaraman, Arif M. Khan, Marco Minutoli, A. Pothen, S. Rajamanickam, Oguz Selvitopi, Nathan R. Tallent, Antonino Tumeo
{"title":"EXAGRAPH: Graph and combinatorial methods for enabling exascale applications","authors":"Seher Acer, A. Azad, E. Boman, A. Buluç, K. Devine, SM Ferdous, Nitin Gawande, Sayan Ghosh, M. Halappanavar, A. Kalyanaraman, Arif M. Khan, Marco Minutoli, A. Pothen, S. Rajamanickam, Oguz Selvitopi, Nathan R. Tallent, Antonino Tumeo","doi":"10.1177/10943420211029299","DOIUrl":"https://doi.org/10.1177/10943420211029299","url":null,"abstract":"Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"553 - 571"},"PeriodicalIF":3.1,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42735882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francis J. Alexander, James Ang, Jenna A. Bilbrey, J. Balewski, T. Casey, Ryan Chard, J. Choi, Sutanay Choudhury, B. Debusschere, Anthony Degennaro, Nikoli Dryden, J. Ellis, Ian T. Foster, Cristina Garcia Cardona, Sayan Ghosh, P. Harrington, Yunzhi Huang, S. Jha, Travis Johnston, Ai Kagawa, R. Kannan, Neeraj Kumar, Zhengchun Liu, N. Maruyama, S. Matsuoka, Erin McCarthy, J. Mohd-Yusof, Peter Nugent, Yosuke Oyama, T. Proffen, D. Pugmire, S. Rajamanickam, V. Ramakrishniah, M. Schram, S. Seal, G. Sivaraman, Christine M. Sweeney, Li Tan, R. Thakur, B. V. Van Essen, Logan T. Ward, P. Welch, Michael Wolf, S. Xantheas, K. Yager, Shinjae Yoo, Byung-Jun Yoon
{"title":"Co-design Center for Exascale Machine Learning Technologies (ExaLearn)","authors":"Francis J. Alexander, James Ang, Jenna A. Bilbrey, J. Balewski, T. Casey, Ryan Chard, J. Choi, Sutanay Choudhury, B. Debusschere, Anthony Degennaro, Nikoli Dryden, J. Ellis, Ian T. Foster, Cristina Garcia Cardona, Sayan Ghosh, P. Harrington, Yunzhi Huang, S. Jha, Travis Johnston, Ai Kagawa, R. Kannan, Neeraj Kumar, Zhengchun Liu, N. Maruyama, S. Matsuoka, Erin McCarthy, J. Mohd-Yusof, Peter Nugent, Yosuke Oyama, T. Proffen, D. Pugmire, S. Rajamanickam, V. Ramakrishniah, M. Schram, S. Seal, G. Sivaraman, Christine M. Sweeney, Li Tan, R. Thakur, B. V. Van Essen, Logan T. Ward, P. Welch, Michael Wolf, S. Xantheas, K. Yager, Shinjae Yoo, Byung-Jun Yoon","doi":"10.1177/10943420211029302","DOIUrl":"https://doi.org/10.1177/10943420211029302","url":null,"abstract":"Rapid growth in data, computational methods, and computing power is driving a remarkable revolution in what variously is termed machine learning (ML), statistical learning, computational learning, and artificial intelligence. In addition to highly visible successes in machine-based natural language translation, playing the game Go, and self-driving cars, these new technologies also have profound implications for computational and experimental science and engineering, as well as for the exascale computing systems that the Department of Energy (DOE) is developing to support those disciplines. Not only do these learning technologies open up exciting opportunities for scientific discovery on exascale systems, they also appear poised to have important implications for the design and use of exascale computers themselves, including high-performance computing (HPC) for ML and ML for HPC. The overarching goal of the ExaLearn co-design project is to provide exascale ML software for use by Exascale Computing Project (ECP) applications, other ECP co-design centers, and DOE experimental facilities and leadership class computing facilities.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"598 - 616"},"PeriodicalIF":3.1,"publicationDate":"2021-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45120001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony T Bogetti, Austin Clyde, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian T Chong, Carlos Simmerling, David J Hardy, Julio Dc Maia, James C Phillips, Thorsten Kurth, Abraham C Stern, Lei Huang, John D McCalpin, Mahidhar Tatineni, Tom Gibbs, John E Stone, Shantenu Jha, Arvind Ramanathan, Rommie E Amaro
{"title":"AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics.","authors":"Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony T Bogetti, Austin Clyde, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian T Chong, Carlos Simmerling, David J Hardy, Julio Dc Maia, James C Phillips, Thorsten Kurth, Abraham C Stern, Lei Huang, John D McCalpin, Mahidhar Tatineni, Tom Gibbs, John E Stone, Shantenu Jha, Arvind Ramanathan, Rommie E Amaro","doi":"10.1177/10943420211006452","DOIUrl":"https://doi.org/10.1177/10943420211006452","url":null,"abstract":"<p><p>We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.</p>","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 5","pages":"432-451"},"PeriodicalIF":3.1,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8064023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140860617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special Issue Introduction: The Gordon Bell Special Prize for HPC-Based COVID-19 Research Finalists","authors":"B. Supinski","doi":"10.1177/10943420211044760","DOIUrl":"https://doi.org/10.1177/10943420211044760","url":null,"abstract":"As the entire world realizes, 2020 was an extraordinary year. One of the brighter aspects of the year’s exceptionalism was the numerous demonstrations that high-performance computing (HPC) could contribute to solutions ofmany of themost difficult problems that our society faces. In recognition of those benefits for the one of the most pressing problems of 2020 and 2021, Gordon Bell, a pioneer in high-performance and parallel computing, endowed the Gordon Bell Special Prize for HPCBased COVID-19 Research. The prize recognizes outstanding research achievement towards the understanding of the COVID-19 pandemic through the use of HPC. The purpose of the award is to recognize the innovative parallel computing contributions towards the solution of the global crisis. This special issue presents the four papers that were selected as finalists for the award. The Gordon Bell Prize Committee selected these nominations based on performance and innovation in their computational methods, in addition to their contributions towards understanding the nature, spread and/or treatment of the disease. More specifically, the committee evaluated nominations on the basis of the following considerations:","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"431 - 431"},"PeriodicalIF":3.1,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65398856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}