The International Conference on High Performance Computing in Asia-Pacific Region Companion最新文献

筛选
英文 中文
Advantages of Space-Time Finite Elements for Domains with Time Varying Topology 时变拓扑域的空时有限元优势
N. Hosters, Maximilian von Danwitz, Patrick Antony, M. Behr
{"title":"Advantages of Space-Time Finite Elements for Domains with Time Varying Topology","authors":"N. Hosters, Maximilian von Danwitz, Patrick Antony, M. Behr","doi":"10.1145/3440722.3440907","DOIUrl":"https://doi.org/10.1145/3440722.3440907","url":null,"abstract":"ACM Reference Format: Norbert Hosters, Maximilian von Danwitz, Patrick Antony, and Marek Behr. 2021. Advantages of Space-Time Finite Elements for Domains with Time Varying Topology. In The International Conference on High Performance Computing in Asia-Pacific Region Companion (HPC Asia 2021 Companion), January 20–22, 2021, Virtual Event, Republic of Korea. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3440722.3440907","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126322838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Precision Calculation of Iterative Refinement of Eigenpairs of a Real Symmetric-Definite Generalized Eigenproblem by Using a Filter Composed of a Single Resolvent 用单解组成的滤波器迭代求实对称定广义特征问题特征对的单精度计算
H. Murakami
{"title":"Single-Precision Calculation of Iterative Refinement of Eigenpairs of a Real Symmetric-Definite Generalized Eigenproblem by Using a Filter Composed of a Single Resolvent","authors":"H. Murakami","doi":"10.1145/3440722.3440784","DOIUrl":"https://doi.org/10.1145/3440722.3440784","url":null,"abstract":"By using a filter, we calculate approximate eigenpairs of a real symmetric-definite generalized eigenproblem Av = λBv whose eigenvalues are in a specified interval. In our experiments in this paper, the IEEE-754 single-precision floating-point (binary 32bit) number system is used for calculations. In general, a filter is constructed by using some resolvents with different shifts ρ. For a given vector x, an action of a resolvent is given by solving a system of linear equations C(ρ)y = Bx for y, here the coefficient C(ρ) = A − ρB is symmetric. We assume to solve this system of linear equations by matrix factorization of C(ρ), for example by the modified Cholesky method (LDLT decomposition method). When both matrices A and B are banded, C(ρ) is also banded and the modified Cholesky method for banded system can be used to solve the system of linear equations. The filter we used is either a polynomial of a resolvent with a real shift, or a polynomial of an imaginary part of a resolvent with an imaginary shift. We use only a single resolvent to construct the filter in order to reduce both amounts of calculation to factor matrices and especially storage to hold factors of matrices. The most disadvantage when we use only a single resolvent rather than many is, such a filter have poor properties especially when compuation is made in single-precision. Therefore, approximate eigenpairs required are not obtained in good accuracy if they are extracted from the set of vectors made by an application of a combination of B-orthonormalization and filtering to a set of initial random vectors. However, experiments show approximate eigenpairs required are refined well if they are extracted from the set of vectors obtained by a few applications of a combination of B-orthonormalization and filtering to a set of initial random vectors.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131374910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular-Continuum Flow Simulation in the Exascale and Big Data Era 百亿亿次和大数据时代的分子连续流模拟
Philipp Neumann, Vahid Jafari, P. Jarmatz, F. Maurer, Helene Wittenberg, Niklas Wittmer
{"title":"Molecular-Continuum Flow Simulation in the Exascale and Big Data Era","authors":"Philipp Neumann, Vahid Jafari, P. Jarmatz, F. Maurer, Helene Wittenberg, Niklas Wittmer","doi":"10.1145/3440722.3440903","DOIUrl":"https://doi.org/10.1145/3440722.3440903","url":null,"abstract":"(b) Figure 1: (a) Slice through a coupled 3D vortex street simulation using a Lattice Boltzmann solver (CFD), also illustrating the location of the embedded MD domain (red box). (b) Y-component of flow velocity in the center of the MD domain over time: noisy result of MD (red dots), CFD result (blue line), filter results for a Median Filter (blue dots) and a Gaussian Filter (green dots) from scipy.ndimage","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"245 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134086497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale Modelling of Urban Air Pollution with Coupled Weather Forecast and Traffic Simulation on HPC Architecture 基于HPC架构的城市大气污染天气预报与交通模拟耦合多尺度模型
L. Kornyei, Z. Horváth, A. Ruopp, Á. Kovács, Bence Liszkai
{"title":"Multi-scale Modelling of Urban Air Pollution with Coupled Weather Forecast and Traffic Simulation on HPC Architecture","authors":"L. Kornyei, Z. Horváth, A. Ruopp, Á. Kovács, Bence Liszkai","doi":"10.1145/3440722.3440917","DOIUrl":"https://doi.org/10.1145/3440722.3440917","url":null,"abstract":"Urban air pollution is one of the global challenges to which over 3 million deaths are attributable yearly. Traffic is emitting over 40% of several contaminants, like NO2 [10]. The directive 2008/50/EC of the European Commission prescribes the assessment air quality by accumulating exceedance of contamination concentration limits over a one-year period using measurement stations, which may be supplemented by modeling techniques to provide adequate information on spatial distribution. Computational models do predict that small scale spatial fluctuation is expected on the street level: local air flow phenomena can cluster up pollutants or carry them away far from the location of emission [2]. The spread of the SARS-CoV-2 virus also interacts with urban air quality. Regions in lock down have highly reduced air pollution strain due to the drop of traffic [4]. Also, correlation between the fatality rate of a previous respiratory disease, SARS 2002, and Air Pollution Index suggests that bad air quality may double fatality rate [6]. At street level pollution dispersion highly depends on the daily weather, a one-year simulation low time scale model is needed. Additionally, to resolve street-level phenomena a cell size of 1 to 4 meters are utilized in these regions that requires CFD methods to use a simulation domain of 1 to 100 million cells. Memory and computational requirements for these tasks are enormous, so HPC architecture is needed to have reasonable results within a manageable time frame. To tackle this challenge, the Urban Air Pollution (UAP) workflow is developed as a pilot of the HiDALGO project [7], which is funded by the H2020 framework of the European Union. The pilot is designed in a modular way with the mindset to be developed into a digital twin model later. Its standardized interfaces enable multiple software to be used in a specific module. At its core, a traffic simulation implemented in SUMO is coupled with a CFD simulation. Currently OpenFOAM (v1906, v1912 and v2006) and Ansys Fluent (v19.2) are supported. This presentation focuses on the OpenFOAM implementation, as it proved more feasible and scalable on most HPC architectures. The incompressible unsteady Reynolds-averaged Navier– Stokes equations are solved with the PIMPLE method, Courant-number based adaptive time stepping and transient atmospheric boundary conditions. The single component NOx-type pollution is calculated independently as a scalar with transport equations along the flow field. Pollution emission is treated as a per cell volumetric source that changes in time. The initial condition is obtained from a steady state solution at the initial time with the SIMPLE method, using the identical, but stationary boundary conditions and source fields. Custom modules are developed for proper boundary condition and source term handling. The UAP workflow supports automatic 3D air flow geometry and traffic network generation from OpenStreetMap data. Ground and building information","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114143442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed MLPerf ResNet50 Training on Intel Xeon Architectures with TensorFlow 基于TensorFlow的Intel至强架构分布式MLPerf ResNet50训练
Wei Wang, N. Hasabnis
{"title":"Distributed MLPerf ResNet50 Training on Intel Xeon Architectures with TensorFlow","authors":"Wei Wang, N. Hasabnis","doi":"10.1145/3440722.3440880","DOIUrl":"https://doi.org/10.1145/3440722.3440880","url":null,"abstract":"MLPerf benchmarks, which measure training and inference performance of ML hardware and software, have published three sets of ML training results so far. In all sets of results, ResNet50v1.5 was used as a standard benchmark to showcase the latest developments on image recognition tasks. The latest MLPerf training round (v0.7) featured Intel’s submission with TensorFlow. In this paper, we describe the recent optimization work that enabled this submission. In particular, we enabled BFloat16 data type in ResNet50v1.5 model as well as in Intel-optimized TensorFlow to exploit full potential of 3rd generation Intel Xeon scalable processors that have built-in BFloat16 support. We also describe the performance optimizations as well as the state-of-the-art accuracy/convergence results of ResNet50v1.5 model, achieved with large-scale distributed training (with upto 256 MPI workers) with Horovod. These results lay great foundation to support future MLPerf training submissions with large scale Intel Xeon clusters.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132579719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Node-level Performance Optimizations in CFD Codes CFD代码中的节点级性能优化
Peter Wauligmann, Jakob Dürrwächter, Philipp Offenhäuser, A. Schlottke, M. Bernreuther, B. Dick
{"title":"Node-level Performance Optimizations in CFD Codes","authors":"Peter Wauligmann, Jakob Dürrwächter, Philipp Offenhäuser, A. Schlottke, M. Bernreuther, B. Dick","doi":"10.1145/3440722.3440914","DOIUrl":"https://doi.org/10.1145/3440722.3440914","url":null,"abstract":"We present examples of beneficial node-level performance optimizations in three computational fluid-dynamics applications. In particular, we not only quantify the speedup achieved but also try to assess flexibility, readability, (performance) portability and labor effort.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115037306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High Performance Simulations of Quantum Transport using Manycore Computing 基于多核计算的量子传输的高性能模拟
Yosang Jeong, H. Ryu
{"title":"High Performance Simulations of Quantum Transport using Manycore Computing","authors":"Yosang Jeong, H. Ryu","doi":"10.1145/3440722.3440879","DOIUrl":"https://doi.org/10.1145/3440722.3440879","url":null,"abstract":"The Non-Equilibrium Green’s Function (NEGF) has been widely utilized in the field of nanoscience and nanotechnology to predict carrier transport behaviors in electronic device channels of sizes in a quantum regime. This work explores how much performance improvement can be driven for NEGF computations with unique features of manycore computing, where the core numerical step of NEGF computations involves a recursive process of matrix-matrix multiplication. The major techniques adopted for the performance enhancement are data-restructuring, matrix-tiling, thread-scheduling, and offload computing and we present in-depth discussion on why they are critical to fully exploit the power of manycore computing hardware including Intel Xeon Phi Knights Landing systems and NVIDIA general-purpose graphic processing unit (GPU) devices. Performance of the optimized algorithm has been tested in a single computing node, where the host is Xeon Phi 7210 that is equipped with two NVIDIA Quadro GV100 GPU devices. The target structure of NEGF simulations is a [100] silicon nanowire that consists of 100K atoms involving a 1000K × 1000K complex Hamiltonian matrix. Through rigorous benchmark tests, we show, with optimization techniques whose details are elaborately explained, the workload can be accelerated almost by a factor of up to ∼ 20 compared to the unoptimized case.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132977226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Parallel Profiling Tools for Programs utilizing the FFT 利用FFT的程序并行分析工具的比较
B. Leu, S. Aseeri, B. Muite
{"title":"A Comparison of Parallel Profiling Tools for Programs utilizing the FFT","authors":"B. Leu, S. Aseeri, B. Muite","doi":"10.1145/3440722.3440881","DOIUrl":"https://doi.org/10.1145/3440722.3440881","url":null,"abstract":"Performance monitoring is an important component of code optimization. Performance monitoring is also important for the beginning user, but can be difficult to configure appropriately. The overhead of the performance monitoring tools Craypat, FPMP, mpiP, Scalasca and TAU, are measured using default configurations likely to be choosen by a novice user and shown to be small when profiling Fast Fourier Transform based solvers for the Klein Gordon equation based on 2decomp&FFT and on FFTE. Performance measurements help explain that despite FFTE having a more efficient parallel algorithm, it is not always faster than 2decomp&FFT because the complied single core FFT is not as fast as that in FFTW which is used in 2decomp&FFT.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115591535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters Intel Xeon Phi集群的高效并行多网格方法
K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi
{"title":"Efficient Parallel Multigrid Method on Intel Xeon Phi Clusters","authors":"K. Nakajima, Balazs Gerofi, Y. Ishikawa, Masashi Horikoshi","doi":"10.1145/3440722.3440882","DOIUrl":"https://doi.org/10.1145/3440722.3440882","url":null,"abstract":"The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-FVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ to the MGCG solver, and evaluated the performance of the solver with various types of OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OFP) system at JCAHPC using up to 1,024 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 20%. This is one of the first examples of SELL-C-σ applied to forward/backward substitutions in ILU-type smoother of multigrid solver. Furthermore, effects of IHK/McKernel has been investigated, and it achieved 11% improvement on 1,024 nodes.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122273865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An efficient halo approach for Euler-Lagrange simulations based on MPI-3 shared memory 一种基于MPI-3共享内存的欧拉-拉格朗日模拟晕轮方法
Patrick Kopper, M. Pfeiffer, S. Copplestone, A. Beck
{"title":"An efficient halo approach for Euler-Lagrange simulations based on MPI-3 shared memory","authors":"Patrick Kopper, M. Pfeiffer, S. Copplestone, A. Beck","doi":"10.1145/3440722.3440904","DOIUrl":"https://doi.org/10.1145/3440722.3440904","url":null,"abstract":"Euler-Lagrange methods are a common approach for simulation of dispersed particle-laden flow, e.g. in turbomachinery. In this approach, the fluid is treated as continuous phase with an Eulerian field solver whereas the Lagrangian movement of the dispersed phase is described through the equations of motion for each individual particle. In high-performance computing, the load of the fluid phase is only dependent on the degrees of freedom and load-balancing steps can be taken a priori, thereby ensuring optimal scaling. However, the discrete phase introduces local load imbalances that cannot easily predicted as generally neither the spatial particle distribution nor the computational cost for advancing particles in relation to the fluid integration are know a priori. Runtime load balancing alleviates this problem by adjusting the local load on each processor according to information gathered during the simulation [4]. Since the load balancing step becomes part of the simulation time, its performance and appropriate scaling on modern HPC systems becomes of crucial importance. In this talk, we will first present the FLEXI framework for the Euler-Lagrange system, and follow by introducing the previous approach and highlight its difficulties. FLEXI is a high-order accurate, massively parallel CFD framework based on the Discontinuous Galerkin Spectral Element Method (DGSEM). It has shown excellent scaling properties for the fluid phase and was recently extended by particle tracking capabilities [1], developed together with the PICLas framework [2]. In FLEXI, the mesh is saved in the HDF5 format, allowing for parallel access, with the elements presorted along a space-filling curve (SFC). This approach has shown its suitability for fluid simulations as each processor requires and accesses only the local mesh information, thereby reducing I/O on the underlying file system [3]. However, the particle phase needs additional information around the fluid domain to retain high computational efficiency since particles can cross the local domain boundary at any point during a time step. In previous implementations, this “halo region” information was communicated between each individual processor, causing significant CPU and network load for an extended period of time during initialization and each load balancing step. Therefore, we propose an method developed from scratch utilizing modern MPI calls and able to overcome most of the challenges in the previous approach. This reworked method utilizes MPI-3 shared memory to make mesh information available to all processors on a compute-node. We perform a two-step, communication-free identification of all relevant mesh elements for a compute-node. Furthermore, by making the mesh information accessible to all processors sharing local memory, we eliminate redundant calculations and reduce data duplication. We conclude by presenting examples of large scale computations of particle-laden flows in complex turbomachinery system","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116826805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信