Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region最新文献_第6页

Multiplicative Schwartz-Type Block Multi-Color Gauss-Seidel Smoother for Algebraic Multigrid Methods 代数多重网格方法的乘法Schwartz-Type块多色Gauss-Seidel光滑

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368481

Masatoshi Kawai, Akihiro Ida, Hiroya Matsuba, K. Nakajima, M. Bolten

引用次数: 0

Energy Efficient Runahead Execution on a Tightly Coupled Heterogeneous Core 紧耦合异构核上的高能效提前执行

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368496

Susumu Mashimo, Ryota Shioya, Koji Inoue

{"title":"Energy Efficient Runahead Execution on a Tightly Coupled Heterogeneous Core","authors":"Susumu Mashimo, Ryota Shioya, Koji Inoue","doi":"10.1145/3368474.3368496","DOIUrl":"https://doi.org/10.1145/3368474.3368496","url":null,"abstract":"Out-of-order (OoO) processors generally offer significant performance gains over simpler in-order (InO) processors. However, recent studies have revealed that OoO processors provide little performance benefit in many program phases, and these phases are distributed in fine granularity. Leveraging these fine-grained phases, tightly coupled heterogeneous cores (TCHCs) have been proposed to improve the energy efficiency. A TCHC, which is a processor core that consists of multiple back-ends, each with different characteristics in terms of their performance and energy consumption (e.g., a power-efficient InO back-end and a high-performance OoO back-end), improves the energy efficiency by executing programs by switching to the most energy-efficient back-end with a very small switching penalty. We propose a novel technique to further improve the energy efficiency of a TCHC. The proposed technique is based on runahead execution (RAE), which is a prefetch technique that executes instructions ahead of long-latency cache misses and issues independent cache misses earlier. Leveraging the characteristics of TCHCs and RAE, the proposed technique increases the utilization of energy-efficient back-ends, thereby significantly improving the energy efficiency. Our evaluation results show that our proposed method achieves 13% of energy-delay product (EDP) over a state-of-the-art TCHC using Oracle switching decision logic.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125981718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor 提高现代共享内存处理器内存系统仿真的精度

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368483

Yuetsu Kodama, Tetsuya Odajima, A. Asato, M. Sato

{"title":"Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor","authors":"Yuetsu Kodama, Tetsuya Odajima, A. Asato, M. Sato","doi":"10.1145/3368474.3368483","DOIUrl":"https://doi.org/10.1145/3368474.3368483","url":null,"abstract":"For the purpose of developing applications for supercomputer Fugaku at an early stage, RIKEN has developed a processor simulator. This simulator is based on the general-purpose processor simulator gem5. It does not simulate the actual hardware of a Fugaku processor. However, we believe that sufficient simulation accuracy can be obtained since it simulates the instruction pipeline of out-of-order execution with cycle-level accuracy along with performing detailed parameter tuning of out-of-order resources. In order to estimate the accurate execution time of a program, it is necessary to simulate with accuracy not only the instruction execution time, but also the access time of the cache memory hierarchy. Therefore, in the RIKEN simulator, we expanded gem5 to match the performance of the cache memory hierarchy to that of a Fugaku processor. In this simulator, we aim to estimate the execution cycles of one node application on a Fugaku processor with accuracy that enables relative evaluation and application tuning. In this paper, we show the details of the implementation of this simulator and verify its accuracy compared with that of a Fugaku processor test chip. In the evaluation of the total 46 kernel benchmarks, it was confirmed that the difference is 13% or less for 85% of the kernels. In the multithreaded execution of Stream Triad benchmark, scalable performance according to the number of threads was confirmed, and achieved over 80% of memory throughput with enough accuracy.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126400601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Dual-Plane Isomorphic Hypercube Network 双平面同构超立方体网络

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368493

T. Hosomi, Ryota Yasudo, M. Koibuchi, S. Shimojo

引用次数: 1

Exploiting Spark for HPC Simulation Data: Taming the Ephemeral Data Explosion 利用Spark实现HPC模拟数据:驯服短暂数据爆炸

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368482

M. Jiang, Brian Gallagher, Albert Chu, G. Abdulla, Timothy Bender

引用次数: 2

Diamond matrix powers kernels 钻石矩阵幂核

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI: 10.1145/3368474.3368494

Emil Vatai, U. Singhal, R. Suda

引用次数: 1

Effect of Mixed Precision Computing on H-Matrix Vector Multiplication in BEM Analysis 边界元分析中混合精度计算对h矩阵向量乘法的影响

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-10-30 DOI: 10.1145/3368474.3368479

R. Ooi, T. Iwashita, Takeshi Fukaya, Akihiro Ida, Rio Yokota

引用次数: 4

Towards Real Time Multi-robot Routing using Quantum Computing Technologies 利用量子计算技术实现实时多机器人路由

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-01-14 DOI: 10.1145/3293320.3293333

James Clark, Tristan West, Joseph Zammit, X. Guo, Luke Mason, Duncan Russell

引用次数: 15

Acceleration of Symmetric Sparse Matrix-Vector Product using Improved Hierarchical Diagonal Blocking Format 基于改进分层对角块格式的对称稀疏矩阵向量积加速

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-01-14 DOI: 10.1145/3293320.3293332

Ryo Muro, A. Fujii, Teruo Tanaka

{"title":"Acceleration of Symmetric Sparse Matrix-Vector Product using Improved Hierarchical Diagonal Blocking Format","authors":"Ryo Muro, A. Fujii, Teruo Tanaka","doi":"10.1145/3293320.3293332","DOIUrl":"https://doi.org/10.1145/3293320.3293332","url":null,"abstract":"In the previous study, Guy et al. proposed sparse matrix-vector product (SpMV) acceleration using the Hierarchical Diagonal Blocking (HDB) format that recursively repeated partitioning, reordering, and blocking on symmetric sparse matrix. The HDB format stores sparse matrix hierarchically using tree structure. Each node of tree structure of HDB format store small sparse matrices using CSR format. In this present study, we examined two problems with the HDB format and provided a solution for each problem. First, SpMV using the HDB format has a partial dependent relationship among hierarchies. The problem with the HDB format is that the parallelism of computation decreases as the hierarchy of nodes gets closer to the root. Thus, we propose cutting of dependency using work vectors to solve this problem. Second, each node of the conventional HDB format is stored in Compressed Sparse Row (CSR) format. Block compressed Sparse Row (BSR) format often becomes faster than CSR format in SpMV performance. Thus, we evaluated the effectiveness of our proposed method with work vectors also for BSR-HDB format. In addition, we compare the performance in the general format (CSR format, BSR format) using the Intel Math Kernel Library (MKL), the conventional HDB format, and the expanded HDB format by using 22 types of sparse matrix that from various field. The results showed that the SpMV performance was highest in the HDB format that we expanded in 19 types of sparse matrix, which was 1.99 times faster than the CSR format.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115705921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Comparative benchmarking of HPC systems for GSS applications: GSS applications in the HPC ecosystem GSS应用的HPC系统比较基准测试:HPC生态系统中的GSS应用

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-01-14 DOI: 10.1145/3293320.3293326

D. Kaliszan, S. Fürst, M. Gienger, Sergiy Gogolenko, N. Meyer, S. Petruczynik

{"title":"Comparative benchmarking of HPC systems for GSS applications: GSS applications in the HPC ecosystem","authors":"D. Kaliszan, S. Fürst, M. Gienger, Sergiy Gogolenko, N. Meyer, S. Petruczynik","doi":"10.1145/3293320.3293326","DOIUrl":"https://doi.org/10.1145/3293320.3293326","url":null,"abstract":"The work undertaken in this paper was done in the Centre of Excellence for Global Systems Science (CoeGSS), an interdisciplinary project, funded by the European Commission. The project provides decision-support in the face of global challenges. It brings together HPC and global systems science. This paper presents a proposition of GSS benchmark with the aim to find the most suitable HPC architecture and the best HPC system which allows to run GSS applications effectively. The GSS provides evidence about global systems challenges, e.g. the network structure of the world economy, energy, water and food supply systems, the global financial system or the global city system, and the scientific community. The outcome of the analysis is defining a benchmark which represents the GSS environment in the best way. Three exemplary challenges were defined as pilot applications: Health Habits, Green Growth and Global Urbanisation extended with additional applications from GSS ecosystem: Iterative proportional fitting (IPF), Data rastering - a preprocessing process converting all vectorial representations of georeferenced data into raster files to be later used as simulation input, Weather Research and Forecasting (WRF) model, CMAQ/CCTM (Community Air Multiscale Quality Modelling System/The CMAQ Chemistry-Transport Mode), CM1 (Cloud Modelling), ABMS (Agent-based Modelling and Simulation), OpenSWPC (An Open-source Seismic Wave Propagation Code). The above list seems to be quite rich and reflects the real GSS world as much as possible, having in mind, for example the real-world applications availability. Additionally, the authors tested new HPC platforms based on Intel® Xeon® Gold 6140, AMD EpycTM, ARM Hi1616 and IBM Power8+. Due to the hardware availability, the testbed consisted of a limited number of nodes. This restricted the ability to provide full tests of scalability for given applications. However, this small number of available computational units (cores) can provide valuable outcome including architecture comparison for different applications based on execution times, TDPs1 and TCO2. These are the basic metrics used for providing a ranking of HPC architectures. Finally, this document is thought to be valuable information for the GSS community for future purposes and analysis to determine their specific demands as well as - in general - to help develop a mature final benchmark set reflecting the GSS environment requirements and specialty. As none of the existing benchmarks is dedicated to the GSS community, the authors decided to create one by calling it a GSS benchmark to serve and help GSS users in their future work.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1