2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)最新文献

筛选
英文 中文
Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches GPU通信中的性能权衡:主机和设备启动方法的研究
Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick
{"title":"Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches","authors":"Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick","doi":"10.1109/PMBS51919.2020.00016","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00016","url":null,"abstract":"Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, \"How can they reduce the overheads and achieve the best communication performance on GPUs?\" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116968215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing fpga在科学计算中的性能和能效潜力
T. Nguyen, Samuel Williams, Marco Siracusa, Colin MacLean, D. Doerfler, N. Wright
{"title":"The Performance and Energy Efficiency Potential of FPGAs in Scientific Computing","authors":"T. Nguyen, Samuel Williams, Marco Siracusa, Colin MacLean, D. Doerfler, N. Wright","doi":"10.1109/PMBS51919.2020.00007","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00007","url":null,"abstract":"Hardware specialization is a promising direction for the future of digital computing. Reconfigurable technologies enable hardware specialization with modest non-recurring engineering cost. In this paper, we use FPGAs to evaluate the benefits of building specialized hardware for numerical kernels found in scientific applications. In order to properly evaluate performance, we not only compare Intel Arria 10 and Xilinx U280 performance against Intel Xeon, Intel Xeon Phi, and NVIDIA V100 GPUs, but we also extend the Empirical Roofline Toolkit (ERT) to FPGAs in order to assess our results in terms of the Roofline Model. Although FPGA performance is known to be far less than that of a GPU, we also benchmark the energy efficiency of each platform for the scientific kernels comparing to microbenchmark and technological limits. Results show that while FPGAs struggle to compete in absolute terms with GPUs on memory- and compute-intensive kernels, they require far less power and can deliver nearly the same energy efficiency.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132346595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/pmbs51919.2020.00002","DOIUrl":"https://doi.org/10.1109/pmbs51919.2020.00002","url":null,"abstract":"","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134594762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations 评估NVIDIA A100安培GPU在稀疏和批处理计算中的性能
H. Anzt, Y. M. Tsai, A. Abdelfattah, T. Cojean, J. Dongarra
{"title":"Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations","authors":"H. Anzt, Y. M. Tsai, A. Abdelfattah, T. Cojean, J. Dongarra","doi":"10.1109/PMBS51919.2020.00009","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00009","url":null,"abstract":"GPU accelerators have become an Important backbone for scientific high performance-computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA’s newest server-line GPU, the A100 architecture, part of the Ampere generation. Specifically, we assess its performance for sparse and batch computations, as these routines are relied upon in many scientific applications, and compare to the performance achieved on NVIDIA’s previous server-line GPU.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115953429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Benchmarking Julia’s Communication Performance: Is Julia HPC ready or Full HPC? 对Julia的通信性能进行基准测试:Julia HPC准备好了还是完全HPC?
S. Hunold, Sebastian Steiner
{"title":"Benchmarking Julia’s Communication Performance: Is Julia HPC ready or Full HPC?","authors":"S. Hunold, Sebastian Steiner","doi":"10.1109/PMBS51919.2020.00008","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00008","url":null,"abstract":"Julia has quickly become one of the main programming languages for computational sciences, mainly due to its speed and flexibility. The speed and efficiency of Julia are the main reasons why researchers in the field of High Performance Computing have started porting their applications to Julia.Since Julia has a very small binding-overhead to C, many efficient computational kernels can be integrated into Julia without any noticeable performance drop. For that reason, highly tuned libraries, such as the Intel MKL or OpenBLAS, will allow Julia applications to achieve similar computational performance as their C counterparts. Yet, two questions remain: 1) How fast is Julia for memory-bound applications? 2) How efficient can MPI functions be called from a Julia application?In this paper, we will assess the performance of Julia with respect to HPC. To that end, we examine the raw throughput achievable with Julia using a new Julia port of the well-known STREAM benchmark. We also compare the running times of the most commonly used MPI collective operations (e.g., MPI_Allreduce) with their C counterparts. Our analysis shows that HPC performance of Julia is on-par with C in the majority of cases.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116956595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Lightweight Measurement and Analysis of HPC Performance Variability 高性能计算性能变异性的轻量化测量与分析
Jered Dominguez-Trujillo, Keira Haskins, S. J. Khouzani, Chris Leap, Sahba Tashakkori, Quincy Wofford, Trilce Estrada, P. Bridges, Patrick M. Widener
{"title":"Lightweight Measurement and Analysis of HPC Performance Variability","authors":"Jered Dominguez-Trujillo, Keira Haskins, S. J. Khouzani, Chris Leap, Sahba Tashakkori, Quincy Wofford, Trilce Estrada, P. Bridges, Patrick M. Widener","doi":"10.1109/PMBS51919.2020.00011","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00011","url":null,"abstract":"Performance variation deriving from hardware and software sources is common in modern scientific and data-intensive computing systems, and synchronization in parallel and distributed programs often exacerbates their impacts at scale. The decentralized and emergent effects of such variation are, unfortunately, also difficult to systematically measure, analyze, and predict; modeling assumptions which are stringent enough to make analysis tractable frequently cannot be guaranteed at meaningful application scales, and longitudinal methods at such scales can require the capture and manipulation of impractically large amounts of data. This paper describes a new, scalable, and statistically robust approach for effective modeling, measurement, and analysis of large-scale performance variation in HPC systems. Our approach avoids the need to reason about complex distributions of runtimes among large numbers of individual application processes by focusing instead on the maximum length of distributed workload intervals. We describe this approach and its implementation in MPI which makes it applicable to a diverse set of HPC workloads. We also present evaluations of these techniques for quantifying and predicting performance variation carried out on large-scale computing systems, and discuss the strengths and limitations of the underlying modeling assumptions.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"374 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124678063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Warwick Data Store: A Data Structure Abstraction Library 华威数据存储:一个数据结构抽象库
Richard O. Kirk, M. Nolten, R. Kevis, T. Law, S. Maheswaran, Steven A. Wright, S. Powell, G. Mudalige, S. Jarvis
{"title":"Warwick Data Store: A Data Structure Abstraction Library","authors":"Richard O. Kirk, M. Nolten, R. Kevis, T. Law, S. Maheswaran, Steven A. Wright, S. Powell, G. Mudalige, S. Jarvis","doi":"10.1109/PMBS51919.2020.00013","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00013","url":null,"abstract":"With the increasing complexity of memory architectures and scientific applications, developing data structures that are performant, portable, scalable, and support developer productivity, is a challenging task. In this paper, we present Warwick Data Store (WDS), a lightweight and extensible C++ template library designed to manage these complexities and allow rapid prototyping. WDS is designed to abstract details of the underlying data structures away from the user, thus easing application development and optimisation. We show that using WDS does not significantly impact achieved performance across a variety of different scientific benchmarks and proxy-applications, compilers, and different architectures. The overheads are largely below 30% for smaller problems, with the overhead deceasing to below 10% when using larger problems. This shows that the library does not significantly impact the performance, while providing additional functionality to data structures, and the ability to optimise data structures without changing the application code.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122344356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Developing Models for the Runtime of Programs With Exponential Runtime Behavior 开发具有指数运行时行为的程序运行时模型
Michael Burger, Giang Nam Nguyen, C. Bischof
{"title":"Developing Models for the Runtime of Programs With Exponential Runtime Behavior","authors":"Michael Burger, Giang Nam Nguyen, C. Bischof","doi":"10.1109/PMBS51919.2020.00015","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00015","url":null,"abstract":"In this paper, we present a new approach to generate runtime models for programs whose runtime grows exponentially with the value of one input parameter. Such programs are, e.g., of high interest for cryptanalysis to analyze practical security of traditional and post-quantum secure schemes. The model generation approach on the base of profiled training runs is built on ideas realized in the open source tool Extra-P, extended with a new class of model functions and a shared-memory parallel simulated annealing approach to heuristically determine coefficients for the model functions. Our approach is implemented in the open source software SimAnMo (Simulated Annealing Modeler). We demonstrate on various theoretical and synthetic, practical test cases that our approach delivers very accurate models and reliable predictions, compared to standard approaches on x86 and ARM architectures. SimAnMo is also employed to generate models of four codes which are employed to solve the so-called shortest vector problem. This is an important problem from the field of lattice-based cryptography. We demonstrate the quality of our models with measurements for higher lattice dimensions, as far as it is feasible. Additionally, we highlight inherent problems with models for algorithms with exponential runtime.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX A64FX上流核和稀疏矩阵向量乘法的性能建模
C. Alappat, Jan Laukemann, T. Gruber, G. Hager, G. Wellein, N. Meyer, T. Wettig
{"title":"Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX","authors":"C. Alappat, Jan Laukemann, T. Gruber, G. Hager, G. Wellein, N. Meyer, T. Wettig","doi":"10.1109/PMBS51919.2020.00006","DOIUrl":"https://doi.org/10.1109/PMBS51919.2020.00006","url":null,"abstract":"The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. We also identify architectural peculiarities and derive optimization hints. Applying the ECM model to sparse matrix-vector multiplication (SpMV), we motivate why the CRS matrix storage format is inappropriate and how the SELL-C-σ format with suitable code optimizations can achieve bandwidth saturation for SpMV.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Message from the Workshop Chairs 来自研讨会主席的信息
K. Ong, K. Smith‐Miles, Vincent C. S. Lee, W. Ng
{"title":"Message from the Workshop Chairs","authors":"K. Ong, K. Smith‐Miles, Vincent C. S. Lee, W. Ng","doi":"10.1109/AIDM.2006.11","DOIUrl":"https://doi.org/10.1109/AIDM.2006.11","url":null,"abstract":"","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130262378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信