The International Journal of High Performance Computing Applications最新文献

Clacc: OpenACC for C/C++ in Clang Clacc：Clang 中的 C/C++ OpenACC

The International Journal of High Performance Computing Applications Pub Date : 2024-06-14 DOI: 10.1177/10943420241261976

J. Denny, Seyong Lee, Pedro Valero-Lara, Marc Gonzalez-Tallada, Keita Teranishi, Jeffrey S. Vetter

{"title":"Clacc: OpenACC for C/C++ in Clang","authors":"J. Denny, Seyong Lee, Pedro Valero-Lara, Marc Gonzalez-Tallada, Keita Teranishi, Jeffrey S. Vetter","doi":"10.1177/10943420241261976","DOIUrl":"https://doi.org/10.1177/10943420241261976","url":null,"abstract":"The Clacc project has developed OpenACC compiler, runtime, and profiling interface support for C/C++ by extending Clang and LLVM. A key Clacc design feature is that it translates OpenACC to OpenMP to leverage the OpenMP offloading support that is actively being developed for Clang and LLVM. A benefit of this design is support for two compilation modes: traditional compilation mode produces a binary, and source-to-source mode produces OpenMP source. Clacc has been deployed on Oak Ridge National Laboratory’s (ORNL’s) Frontier, on which Clacc is the only OpenACC implementation for C/C++. Clacc supports x86_64, POWER9, AMD GPUs, and NVIDIA GPUs. Clacc’s OpenACC profiling interface support has been integrated with TAU, which is also deployed on Frontier. While Clacc has always supported C as a base language, Clacc also has increasing C++ support, including support for Kokkos’s OpenACC back end. Clacc itself is hosted publicly on GitHub. In this paper, we describe Clacc’s design and mapping from OpenACC directives to OpenMP. We also present a performance evaluation on ORNL’s Frontier (AMD MI250x GPU offload) and Argonne National Laboratory’s (ANL’s) Polaris (NVIDIA A100 GPU offload) for various SPEC ACCEL and Kokkos OpenACC back end benchmarks.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"41 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141339923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An integrated three-dimensional aeromechanical analysis for the prediction of stresses on modern coaxial rotors 用于预测现代同轴转子应力的综合三维空气力学分析

The International Journal of High Performance Computing Applications Pub Date : 2024-05-23 DOI: 10.1177/10943420241255089

Mrinalgouda Patil, Ravi T. Lumba, B. Jayaraman, A. Datta

{"title":"An integrated three-dimensional aeromechanical analysis for the prediction of stresses on modern coaxial rotors","authors":"Mrinalgouda Patil, Ravi T. Lumba, B. Jayaraman, A. Datta","doi":"10.1177/10943420241255089","DOIUrl":"https://doi.org/10.1177/10943420241255089","url":null,"abstract":"This paper presents the first application of an Integrated Three-Dimensional aeromechanical analysis—defined as the coupled solution of three-dimensional finite element-based structural dynamics with a three-dimensional Reynolds-Averaged Navier-Stokes-based fluid dynamics—to predict the stresses on a modern coaxial rotor. The coupling was carried out with the University of Maryland’s structural dynamic solver X3D and the U.S. Army’s CREATETM–AV Helios suite of fluid dynamic solvers. A modern four-bladed hingeless coaxial rotor model—inspired by the gross dimensions of the Sikorsky S-97 Raider but generic and open source otherwise—is developed as a demonstration test case. The new structural solver is driven by parallel and scalable solvers and advanced high performance computing. It is enabled by high-order three-dimensional brick finite elements unified with multibody dynamics, integrated aerodynamics, and a special 3D-to-1D fluid-structure interfaces refines the power of delta-coupling procedure while retaining the advantages of existing CFD mesh motion schemes. The analysis predicts the three-dimensional stresses on the rotor blades and hub, together with the deformations, airloads, and wake, in an integrated manner. Two flight conditions are studied: a low-speed flight at 37 knots and a high-speed flight at 150 knots. Interesting three-dimensional unsteady stress patterns are revealed all across the blade but particularly inboard of 50% rotor radius—patterns that change from flight to flight and have remained invisible until now—since they could neither be predicted nor measured in flight. The maximum axial stresses exhibited 3/rev variation at low speed, and 2/rev variation at high speed flight. The lower rotor carried higher oscillatory stress burden at low speed, whereas both the rotors shared the same stress burden at high speed flight. The key conclusion is that such analysis is now indeed possible, and the stress patterns they reveal provide deeper insights into the dynamics of advanced rotors, and these might provide a path toward mitigating them in the future.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"24 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141107959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting interference between applications and improving the scheduling using malleable application clones 检测应用程序之间的干扰，利用可延展的应用程序克隆改进调度

The International Journal of High Performance Computing Applications Pub Date : 2023-12-13 DOI: 10.1177/10943420231220898

Alberto Cascajo, D. E. Singh, Jesús Carretero

{"title":"Detecting interference between applications and improving the scheduling using malleable application clones","authors":"Alberto Cascajo, D. E. Singh, Jesús Carretero","doi":"10.1177/10943420231220898","DOIUrl":"https://doi.org/10.1177/10943420231220898","url":null,"abstract":"This paper presents a novel feature for improving the scheduling process based on the performance prediction and the detection of CPU and I/O interference between applications. This feature consists of using malleable synthetic benchmarks – called clones – that reproduce the behaviour of applications executed in a cluster. These proxies can be used with two objectives: to build large and representative datasets that can be used to train the machine learning algorithms for modelling the platform workload, and to evaluate in advance if two executing applications can potentially produce contention hazards related to the shared use of the system resources like CPU, cache memory or I/O bandwidth. The proposed framework generates application clones based on generic-purpose performance information collected from monitoring. Unlike other works based on the use of micro-architectures or metrics obtained from simulators, in the approach presented in this work, the application clones generate similar behaviour without extracting data from the binaries, avoiding the necessity of managing code or data from the applications. One advantage of this approach is that they can be shared securely because they have not been generated using any piece of the original code. In addition, the proposed clones are malleable, so they also able to model the application behaviour under a different number of processes. In this work, we show how the use of clones contributes to improving the application scheduling by reducing the number of evaluations that is necessary to perform, and to detect performance degradation (interference) without the necessity of involving the execution of the actual applications. We evaluate the proposed clone generation approach on two sets of benchmarks (CPU and I/O oriented) and several applications. We also compare the performance obtained during the execution of the proxies and the applications to show their similarity. Finally, we include an evaluation of the interference detection using this novel approach.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"4 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139180686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hypergraph-based locality-enhancing methods for graph operations in Big Data applications 基于超图的定位增强方法，用于大数据应用中的图操作

The International Journal of High Performance Computing Applications Pub Date : 2023-11-20 DOI: 10.1177/10943420231214532

Kadir Akbudak

{"title":"Hypergraph-based locality-enhancing methods for graph operations in Big Data applications","authors":"Kadir Akbudak","doi":"10.1177/10943420231214532","DOIUrl":"https://doi.org/10.1177/10943420231214532","url":null,"abstract":"The need for speeding up data analytics increases inevitably due to the need for extracting valuable information from social media, data generated by smart devices with sensors, patterns of people’s communications over the web, items viewed and bought by global-scale customers, cloud applications, etc., all of which take part in the “Big Data.” Such kind of interaction data is very well represented as sparse graphs to enable the graph analytics, which requires efficient underlying kernels. The breadth-first search (BFS)-based traversal is a commonly used kernel in graph algorithms such as the betweenness centrality algorithm for centrality analysis. In this work, we focus on parallel BFS operations and propose hypergraph-based combinatorial models that aim at reducing cache misses and hence exploiting data locality during the parallel BFS operations. Our models are based on finding new vertex visit orders so that locality in accessing the data associated with vertices is exploited. Experiments on graphs arising in a wide range of applications show that our proposed models achieve on average 9% performance improvement in the CPU-based Ligra data analytics framework.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139254965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An implicit barotropic mode solver for MPAS-ocean using a modern Fortran solver interface 使用现代 Fortran 求解器接口的 MPAS-ocean 隐式各向异性模式求解器

The International Journal of High Performance Computing Applications Pub Date : 2023-11-18 DOI: 10.1177/10943420231205601

Hyun-Gyu Kang, Raymond S Tuminaro, Andrey Prokopenko, Seth R Johnson, A. Salinger, Katherine J Evans

{"title":"An implicit barotropic mode solver for MPAS-ocean using a modern Fortran solver interface","authors":"Hyun-Gyu Kang, Raymond S Tuminaro, Andrey Prokopenko, Seth R Johnson, A. Salinger, Katherine J Evans","doi":"10.1177/10943420231205601","DOIUrl":"https://doi.org/10.1177/10943420231205601","url":null,"abstract":"We demonstrate use of a modern Fortran solver interface to manage solver algorithms for an implicit barotropic mode solver in the Model for Predictions Across Scales-Ocean (MPAS-O). ForTrilinos, a Fortran interface to Trilinos that contains a large collection of solver capabilities written in C++, has been implemented in MPAS-O to provide access to a suite of linear solver options. By virtue of the simplified wrapper and interface generator (SWIG) automation tool that generates modern Fortran interfaces to C++ code, we were able to implement the Fortran solver interface in MPAS-O using a familiar Fortran coding style while minimizing performance degradation. The ForTrilinos solver interface is written within MPAS-O’s time stepping modules as a subroutine in conjunction with MPAS-O code. Applied to an idealized ocean and a high-resolution realistic ocean test case, parallel performance of ForTrilinos solvers is examined. It is found that parallel scalability of the ForTrilinos solvers is highly dependent on the number of global synchronization points per solver iteration in each iterative solver algorithm. ForTrilinos solvers perform best compared to the Fortran hand-crafted (FHC) solver when the amount of work per processor is large enough. However, parallel scalability is better with the FHC solver and so when the work per core is modest FHC outperforms ForTrilinos. The intercomparison between the ForTrilinos and FHC solvers reveals that this performance hit in the ForTrilinos solver mostly comes from the global synchronization process, while suggesting that the matrix-vector multiplication process in the FHC solver needs to be optimized for better performance.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139261856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Massively parallel nodal discontinous Galerkin finite element method simulator for room acoustics 用于室内声学的大规模并行节点不连续 Galerkin 有限元法模拟器

The International Journal of High Performance Computing Applications Pub Date : 2023-11-16 DOI: 10.1177/10943420231208948

Anders Melander, Emil Strøm, Finnur Pind, A. Engsig-Karup, Cheol-Ho Jeong, Tim Warburton, Noel Chalmers, J. Hesthaven

{"title":"Massively parallel nodal discontinous Galerkin finite element method simulator for room acoustics","authors":"Anders Melander, Emil Strøm, Finnur Pind, A. Engsig-Karup, Cheol-Ho Jeong, Tim Warburton, Noel Chalmers, J. Hesthaven","doi":"10.1177/10943420231208948","DOIUrl":"https://doi.org/10.1177/10943420231208948","url":null,"abstract":"We present a massively parallel and scalable nodal discontinuous Galerkin finite element method (DGFEM) solver for the time-domain linearized acoustic wave equations. The solver is implemented using the libParanumal finite element framework with extensions to handle curvilinear geometries and frequency dependent boundary conditions of relevance in practical room acoustics. The implementation is benchmarked on heterogeneous multi-device many-core computing architectures, and high performance and scalability are demonstrated for a problem that is considered expensive to solve in practical applications. In a benchmark study, scaling tests show that multi-GPU support gives the ability to simulate large rooms, over a broad frequency range, with realistic boundary conditions, both in terms of computing time and memory requirements. Furthermore, numerical simulations on two non-trivial geometries are presented, a star-shaped room with a dome and an auditorium. Overall, this shows the viability of using a multi-device accelerated DGFEM solver to enable realistic large-scale wave-based room acoustics simulations.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"25 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139268067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing perturbation models for evaluating stability of neuroimaging pipelines. 比较摄动模型评价神经成像管道的稳定性。

IF 3.1

The International Journal of High Performance Computing Applications Pub Date : 2020-09-01 Epub Date: 2020-05-21 DOI: 10.1177/1094342020926237

Gregory Kiar, Pablo de Oliveira Castro, Pierre Rioux, Eric Petit, Shawn T Brown, Alan C Evans, Tristan Glatard

{"title":"Comparing perturbation models for evaluating stability of neuroimaging pipelines.","authors":"Gregory Kiar, Pablo de Oliveira Castro, Pierre Rioux, Eric Petit, Shawn T Brown, Alan C Evans, Tristan Glatard","doi":"10.1177/1094342020926237","DOIUrl":"https://doi.org/10.1177/1094342020926237","url":null,"abstract":"With an increase in awareness regarding a troubling lack of reproducibility in analytical software tools, the degree of validity in scientific derivatives and their downstream results has become unclear. The nature of reproducibility issues may vary across domains, tools, data sets, and computational infrastructures, but numerical instabilities are thought to be a core contributor. In neuroimaging, unexpected deviations have been observed when varying operating systems, software implementations, or adding negligible quantities of noise. In the field of numerical analysis, these issues have recently been explored through Monte Carlo Arithmetic, a method involving the instrumentation of floating-point operations with probabilistic noise injections at a target precision. Exploring multiple simulations in this context allows the characterization of the result space for a given tool or operation. In this article, we compare various perturbation models to introduce instabilities within a typical neuroimaging pipeline, including (i) targeted noise, (ii) Monte Carlo Arithmetic, and (iii) operating system variation, to identify the significance and quality of their impact on the resulting derivatives. We demonstrate that even low-order models in neuroimaging such as the structural connectome estimation pipeline evaluated here are sensitive to numerical instabilities, suggesting that stability is a relevant axis upon which tools are compared, alongside more traditional criteria such as biological feasibility, computational efficiency, or, when possible, accuracy. Heterogeneity was observed across participants which clearly illustrates a strong interaction between the tool and data set being processed, requiring that the stability of a given tool be evaluated with respect to a given cohort. We identify use cases for each perturbation method tested, including quality assurance, pipeline error detection, and local sensitivity analysis, and make recommendations for the evaluation of stability in a practical and analytically focused setting. Identifying how these relationships and recommendations scale to higher order computational tools, distinct data sets, and their implication on biological feasibility remain exciting avenues for future work.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"34 5","pages":"491-501"},"PeriodicalIF":3.1,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1094342020926237","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38292525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

High performance in silico virtual drug screening on many-core processors. 在多核处理器上的高性能硅虚拟药物筛选。

IF 3.1

The International Journal of High Performance Computing Applications Pub Date : 2015-05-01 DOI: 10.1177/1094342014528252

Simon McIntosh-Smith, James Price, Richard B Sessions, Amaurys A Ibarra

{"title":"High performance in silico virtual drug screening on many-core processors.","authors":"Simon McIntosh-Smith, James Price, Richard B Sessions, Amaurys A Ibarra","doi":"10.1177/1094342014528252","DOIUrl":"https://doi.org/10.1177/1094342014528252","url":null,"abstract":"Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.","PeriodicalId":506320,"journal":{"name":"The International Journal of High Performance Computing Applications","volume":"29 2","pages":"119-134"},"PeriodicalIF":3.1,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1094342014528252","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33176419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

HPC AND GRID COMPUTING FOR INTEGRATIVE BIOMEDICAL RESEARCH. 用于综合生物医学研究的高性能计算和网格计算。

The International Journal of High Performance Computing Applications Pub Date : 2009-08-01 DOI: 10.1177/1094342009106192

Tahsin Kurc, Shannon Hastings, Vijay Kumar, Stephen Langella, Ashish Sharma, Tony Pan, Scott Oster, David Ervin, Justin Permar, Sivaramakrishnan Narayanan, Yolanda Gil, Ewa Deelman, Mary Hall, Joel Saltz

引用次数: 0

AUTOMATIC GENERATION OF FFT FOR TRANSLATIONS OF MULTIPOLE EXPANSIONS IN SPHERICAL HARMONICS. 球面谐波中多极展开平移的FFT自动生成。

The International Journal of High Performance Computing Applications Pub Date : 2008-01-01 DOI: 10.1177/1094342008090915

Jakub Kurzak, Dragan Mirkovic, B Montgomery Pettitt, S Lennart Johnsson

引用次数: 0