Darren J Hsu, Hao Lu, Aditya Kashi, Michael Matheson, John Gounley, Feiyi Wang, Wayne Joubert, Jens Glaser
{"title":"TwoFold: Highly accurate structure and affinity prediction for protein-ligand complexes from sequences","authors":"Darren J Hsu, Hao Lu, Aditya Kashi, Michael Matheson, John Gounley, Feiyi Wang, Wayne Joubert, Jens Glaser","doi":"10.1177/10943420231201151","DOIUrl":"https://doi.org/10.1177/10943420231201151","url":null,"abstract":"We describe our development of ab initio protein-ligand binding pose prediction models based on transformers and binding affinity prediction models based on the neural tangent kernel (NTK). Folding both protein and ligand, the TwoFold models achieve efficient and quality predictions matching state-of-the-art implementations while additionally reconstructing protein structures. Solving NTK models points to a new use case for highly optimized linear solver benchmarking codes on HPC.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"191 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136069417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, Defne G. Ozgulbas, Natalia Vassilieva, James Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan
{"title":"GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics","authors":"Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, Defne G. Ozgulbas, Natalia Vassilieva, James Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan","doi":"10.1177/10943420231201154","DOIUrl":"https://doi.org/10.1177/10943420231201154","url":null,"abstract":"We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole-genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2, paving the path to realizing this on large biological data.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136311318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"General framework for re-assuring numerical reliability in parallel Krylov solvers: A case of bi-conjugate gradient stabilized methods","authors":"Roman Iakymchuk, Stef Graillat, José I. Aliaga","doi":"10.1177/10943420231207642","DOIUrl":"https://doi.org/10.1177/10943420231207642","url":null,"abstract":"Parallel implementations of Krylov subspace methods often help to accelerate the procedure of finding an approximate solution of a linear system. However, such parallelization coupled with asynchronous and out-of-order execution often makes more visible the non-associativity impact in floating-point operations. These problems are even amplified when communication-hiding pipelined algorithms are used to improve the parallelization of Krylov subspace methods. Introducing reproducibility in the implementations avoids these problems by getting more robust and correct solutions. This paper proposes a general framework for deriving reproducible and accurate variants of Krylov subspace methods. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The framework is illustrated on the preconditioned BiCGStab method and its pipelined modification, which in fact is a distinctive method from the Krylov subspace family, for the solution of non-symmetric linear systems with message-passing. Finally, we verify the numerical behavior of the two reproducible variants of BiCGStab on a set of matrices from the SuiteSparse Matrix Collection and a 3D Poisson’s equation.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"74 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135218161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joel Criado, Victor Lopez, Joan Vinyals-Ylla-Catala, Guillem Ramirez-Miranda, Xavier Teruel, Marta Garcia-Gasulla
{"title":"Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP","authors":"Joel Criado, Victor Lopez, Joan Vinyals-Ylla-Catala, Guillem Ramirez-Miranda, Xavier Teruel, Marta Garcia-Gasulla","doi":"10.1177/10943420231201153","DOIUrl":"https://doi.org/10.1177/10943420231201153","url":null,"abstract":"This paper presents the evolution of the free agent threads for OpenMP to the new role-shifting threads model and their integration with the Dynamic Load Balancing (DLB) library. We demonstrate how free agent threads can improve resource utilization in OpenMP applications with load imbalance in their nested parallel regions. We also demonstrate how DLB efficiently manages the malleability exposed by the role-shifting threads to address load imbalance issues. We use three real-world scientific applications, one of them to demonstrate that free agents alone can improve the OpenMP model without external tools, and two other MPI+OpenMP applications, one of them with a coupling case, to illustrate the potential of the free agent threads’ malleability with an external resource manager to increase the efficiency of the system. In addition, we demonstrate that the new implementation is more usable than the former one, letting the runtime system automatically make decisions that were made by the programmer previously. All software is released open-source.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"3 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135511599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient implementation of low-order-precision smoothed particle hydrodynamics","authors":"Natsuki Hosono, Mikito Furuichi","doi":"10.1177/10943420231201144","DOIUrl":"https://doi.org/10.1177/10943420231201144","url":null,"abstract":"Smoothed particle hydrodynamics (SPH) method is widely accepted as a flexible numerical treatment for surface boundaries and interactions. High-resolution simulations of hydrodynamic events require high-performance computing (HPC). There is a need for an SPH code that runs efficiently on modern supercomputers involving accelerators such as NVIDIA or AMD graphics processing units. In this work, we applied half-precision, which is widely used in artificial intelligence, to the SPH method. However, improving HPC performance at such low-order precisions is a challenge. An as-is implementation with half-precision will have lower computational cost than that of float/double precision simulations, but also worsens the simulation accuracy. We propose a scaling and shifting method that maintains the simulation accuracy near the level of float/double precision. By examining the impact of half-precision on the simulation accuracy and time-to-solution, we demonstrated that the use of half-precision can improve the computational performance of SPH simulations for scientific purposes without sacrificing the accuracy. In addition, we demonstrated that the efficiency of half-precision depends on the architecture used.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134911412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications","authors":"Marc Gonzalez Tallada, E. Morancho","doi":"10.1177/10943420231188079","DOIUrl":"https://doi.org/10.1177/10943420231188079","url":null,"abstract":"Hybrid computer systems combine compute units (CUs) of different nature like CPUs, GPUs and FPGAs. Simultaneously exploiting the computing power of these CUs requires a careful decomposition of the applications into balanced parallel tasks according to both the performance of each CU type and the communication costs among them. This paper describes the design and implementation of runtime support for OpenMP hybrid GPU-CPU applications, when mixed with GPU-oriented programming models (e.g. CUDA/HIP). The paper describes the case for a hybrid multi-level parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to compute units of different nature (GPUs and CPUs). The paper describes the implementation of runtime support to bridge OpenMP and HIP, introducing the abstractions of Computing Unit and Data Placement. We compare hybrid and non-hybrid executions under state-of-the-art schedulers for OpenMP: static and dynamic task schedulings. Then, we improve the set of schedulers with two additional variants: a memorizing-dynamic task scheduling and a profile-based static task scheduling. On a computing node composed of one AMD EPYC 7742 @ 2.250 GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2 × GPU AMD Radeon Instinct MI50 with 32 GB, hybrid executions present speedups from 1.10× up to 3.5× with respect to a non-hybrid GPU implementation, depending on the number of activated CUs.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"626 - 646"},"PeriodicalIF":3.1,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49064728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian
{"title":"Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants","authors":"Jie Chen, Zhiwei Nie, Yu Wang, Kai Wang, Fan Xu, Zhiheng Hu, Bing Zheng, Zhennan Wang, Guoli Song, Jingyi Zhang, Jie Fu, Xiansong Huang, Zhongqi Wang, Zhixiang Ren, Qiankun Wang, Daixi Li, Dongqing Wei, Bin Zhou, Chao Yang, Yonghong Tian","doi":"10.1177/10943420231188077","DOIUrl":"https://doi.org/10.1177/10943420231188077","url":null,"abstract":"The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135444766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guest editors note: Special issue on clusters, clouds, and data for scientific computing","authors":"J. Dongarra, B. Tourancheau","doi":"10.1177/10943420231180188","DOIUrl":"https://doi.org/10.1177/10943420231180188","url":null,"abstract":"The research areas of cluster, cloud, and data analytics computing, which today provide fundamental infrastructure for all areas of advanced computational science, are being radically transformed by the convergence of at least two unprecedented trends. The fi rst is the ongoing emergence of multicore and hybrid microprocessor designs, ushering in a new era of computing in which system designers must accept energy usage as a fi rst-order constraint, and application designers must be able to exploit parallelism and data locality to an unprecedented degree. As the research community is rapidly becoming aware, the components of the traditional HPC software stack are poorly matched to the characteristics of systems based on these new architectures — hundreds of thousands of nodes, millions of cores, GPU accelerators, reduced bandwidth, and memory per core. The second trend is the dramatic escalation in the amount of data that leading edge scienti fi c applications, and the communities that use them, are either generating or trying to analyze. A key problem in such data intensive science lies not only in the shear volume of bits that must be processed and managed but also in the logistical problems associated with making the data of most current interest available to participants in large national and international collaborations, sitting in different administrative domains, spread across the wide area network, and wanting to use diverse resources — clusters, clouds, and data. This special issue gathers selected papers of the Work-shop on Clusters, Clouds and Data for Scienti fi c Computing (CCDSC) that was held at La Maison des Contes , 69490 Dareize-France, on September 6 – 9, 2022. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scienti","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"211 - 212"},"PeriodicalIF":3.1,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46065430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel multithreaded deduplication of data sequences in nuclear structure calculations","authors":"D. Langr, T. Dytrych","doi":"10.1177/10943420231183697","DOIUrl":"https://doi.org/10.1177/10943420231183697","url":null,"abstract":"High performance computing (HPC) applications that work with redundant sequences of data can benefit from their deduplication. We study this problem on the symmetry-adapted no-core shell model (SA-NCSM), where redundant sequences of different kinds naturally emerge in the data of the basis of the Hilbert space physically relevant to a modeled nucleus. For a fast solution of this problem on multicore architectures, we propose and present three multithreaded algorithms, which employ either concurrent hash tables or parallel sorting methods. Furthermore, we present evaluation and comparison of these algorithms based on experiments performed with real-world SA-NCSM calculations. The results indicate that the fastest option is to use a concurrent hash table, provided that it supports sequences of data as a type of table keys. If such a hash table is not available, the algorithm based on parallel sorting is a viable alternative.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"1 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41427547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabra Ossen, Jeremy Musser, Luke Dalessandro, M. Swany
{"title":"INDIANA—In-Network Distributed Infrastructure for Advanced Network Applications","authors":"Sabra Ossen, Jeremy Musser, Luke Dalessandro, M. Swany","doi":"10.1177/10943420231179662","DOIUrl":"https://doi.org/10.1177/10943420231179662","url":null,"abstract":"Data volumes are exploding as sensors proliferate and become more capable. Edge computing is envisioned as a path to distribute processing and reduce latency. Many models of Edge computing consider small devices running conventional software. Our model includes a more lightweight execution engine for network microservices and a network scheduling framework to configure network processing elements to process streams and direct the appropriate traffic to them. In this article, we describe INDIANA, a complete framework for in-network microservices. We will describe how the two components-the INDIANA network Processing Element (InPE) and the Flange Network Operating System (NOS)-work together to achieve effective in-network processing to improve performance in edge to cloud environments. Our processing elements provide lightweight compute units optimized for efficient stream processing. These elements are customizable and vary in sophistication and resource consumption. The Flange NOS provides first-class flow based reasoning to drive function placement, network configuration, and load balancing that can respond dynamically to network conditions. We describe design considerations and discuss our approach and implementations. We evaluate the performance of stream processing and examine the performance of several exemplar applications on networks of increasing scale and complexity.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"442 - 461"},"PeriodicalIF":3.1,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43385869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}