2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)最新文献

筛选
英文 中文
CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance CHAMPVis:微架构性能的比较层次分析
Lillian Pentecost, Udit Gupta, Elisa Ngan, J. Beyer, Gu-Yeon Wei, D. Brooks, M. Behrisch
{"title":"CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance","authors":"Lillian Pentecost, Udit Gupta, Elisa Ngan, J. Beyer, Gu-Yeon Wei, D. Brooks, M. Behrisch","doi":"10.1109/ProTools49597.2019.00013","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00013","url":null,"abstract":"Performance analysis and optimization are essential tasks for hardware and software engineers. In the age of datacenter-scale computing, it is particularly important to conduct comparative performance analysis to understand discrepancies and limitations among different hardware systems and applications. However, there is a distinct lack of productive visualization tools for these comparisons. We present CHAMPVis, a web-based, interactive visualization tool that leverages the hierarchical organization of hardware systems to enable productive performance analysis. With CHAMPVis, users can make definitive performance comparisons across applications or hardware platforms. In addition, CHAMPVis provides methods to rank and cluster based on performance metrics to identify common optimization opportunities. Our thorough task analysis reveals three types of datacenter-scale performance analysis tasks: summarization, detailed comparative analysis, and interactive performance bottleneck identification. We propose techniques for each class of tasks including (1) 1-D feature space projection for similarity analysis; (2) Hierarchical parallel co-ordinates for comparative analysis; and (3) User interactions for rapid diagnostic queries to identify optimization targets. We evaluate CHAMPVis by analyzing standard datacenter applications and machine learning benchmarks in two different case studies.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115181967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In Situ Visualization of Performance Metrics in Multiple Domains 多领域性能指标的现场可视化
Allen R. Sanderson, John A. Schmidt, A. Humphrey, M. Papka, R. Sisneros
{"title":"In Situ Visualization of Performance Metrics in Multiple Domains","authors":"Allen R. Sanderson, John A. Schmidt, A. Humphrey, M. Papka, R. Sisneros","doi":"10.1109/ProTools49597.2019.00014","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00014","url":null,"abstract":"As application scientists develop and deploy simula- tion codes on to leadership-class computing resources, there is a need to instrument these codes to better understand performance to efficiently utilize these resources. This instrumentation may come from independent third-party tools that generate and store performance metrics or from custom instrumentation tools built directly into the application. The metrics collected are then available for visual analysis, typically in the domain in which there were collected. In this paper, we introduce an approach to visualize and analyze the performance metrics in situ in the context of the machine, application, and communication domains (MAC model) using a single visualization tool. This visualization model provides a holistic view of the application performance in the context of the resources where it is executing.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123395528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/protools49597.2019.00002","DOIUrl":"https://doi.org/10.1109/protools49597.2019.00002","url":null,"abstract":"","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125149698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
[Title page] (标题页)
{"title":"[Title page]","authors":"","doi":"10.1109/protools49597.2019.00001","DOIUrl":"https://doi.org/10.1109/protools49597.2019.00001","url":null,"abstract":"","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122997009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Level Performance Instrumentation for Kokkos Applications Using TAU 使用TAU的Kokkos应用程序的多级性能仪器
S. Shende, Nicholas Chaimov, A. Malony, N. Imam
{"title":"Multi-Level Performance Instrumentation for Kokkos Applications Using TAU","authors":"S. Shende, Nicholas Chaimov, A. Malony, N. Imam","doi":"10.1109/ProTools49597.2019.00012","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00012","url":null,"abstract":"The TAU Performance System® provides a multi-level instrumentation strategy for instrumentation of Kokkos applications. Kokkos provides a performance portable API for expressing parallelism at the node level. TAU uses the Kokkos profiling system to expose performance factors using user-specified parallel kernel names for lambda functions or C++ functors. It can also use instrumentation at the OpenMP, CUDA, pthread, or other runtime levels to expose the implementation details giving a dual focus of higher-level abstractions as well as low-level execution dynamics. This multi-level instrumentation strategy adopted by TAU can highlight performance problems across multiple layers of the runtime system without modifying the application binary.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117028952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Automatic Instrumentation Refinement for Empirical Performance Modeling 经验性能建模的自动仪表改进
Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf
{"title":"Automatic Instrumentation Refinement for Empirical Performance Modeling","authors":"Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf","doi":"10.1109/ProTools49597.2019.00011","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00011","url":null,"abstract":"The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e.,such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools 使用PaRSEC仪器工具的Tile低秩Cholesky分解性能分析
Qinglei Cao, Yu Pei, T. Hérault, Kadir Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra
{"title":"Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools","authors":"Qinglei Cao, Yu Pei, T. Hérault, Kadir Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra","doi":"10.1109/ProTools49597.2019.00009","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00009","url":null,"abstract":"This paper highlights the necessary development of new instrumentation tools within the PaRSE task-based runtime system to leverage the performance of low-rank matrix computations. In particular, the tile low-rank (TLR) Cholesky factorization represents one of the most critical matrix operations toward solving challenging large-scale scientific applications. The challenge resides in the heterogeneous arithmetic intensity of the various computational kernels, which stresses PaRSE's dynamic engine when orchestrating the task executions at runtime. Such irregular workload imposes the deployment of new scheduling heuristics to privilege the critical path, while exposing task parallelism to maximize hardware occupancy. To measure the effectiveness of PaRSE's engine and its various scheduling strategies for tackling such workloads, it becomes paramount to implement adequate performance analysis and profiling tools tailored to fine-grained and heterogeneous task execution. This permits us not only to provide insights from PaRSE, but also to identify potential applications' performance bottlenecks. These instrumentation tools may actually foster synergism between applications and PaRSE developers for productivity as well as high-performance computing purposes. We demonstrate the benefits of these amenable tools, while assessing the performance of TLR Cholesky factorization from data distribution, communication-reducing and synchronization-reducing perspectives. This tool-assisted performance analysis results in three major contributions: a new hybrid data distribution, a new hierarchical TLR Cholesky algorithm, and a new performance model for tuning the tile size. The new TLR Cholesky factorization achieves an 8X performance speedup over existing implementations on massively parallel supercomputers, toward solving large-scale 3D climate and weather prediction applications.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122264437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The Case for a Common Instrumentation Interface for HPC Codes HPC代码通用仪表接口的案例
David Boehme, K. Huck, Jonathan Madsen, J. Weidendorfer
{"title":"The Case for a Common Instrumentation Interface for HPC Codes","authors":"David Boehme, K. Huck, Jonathan Madsen, J. Weidendorfer","doi":"10.1109/ProTools49597.2019.00010","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00010","url":null,"abstract":"Lightweight timekeeping functionality for basic performance logging, regression testing, and anomaly detection is essential in HPC codes. We present the Caliper, TiMemory, and PerfStubs libraries that have recently been developed as common solutions for these tasks. Lightweight, always-on profiling solutions are typically built around user-defined instrumentation points, which can benefit a variety of use cases beyond application timekeeping. We argue for the creation of a tool-agnostic adapter layer to make these instrumentation points available to third-party tools, runtime systems, and system software.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131721087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Asvie: A Timing-Agnostic SVE Optimization Methodology 一种时间不可知的SVE优化方法
M. T. Cruz, Daniel Ruiz, Roxana Rusitoru
{"title":"Asvie: A Timing-Agnostic SVE Optimization Methodology","authors":"M. T. Cruz, Daniel Ruiz, Roxana Rusitoru","doi":"10.1109/ProTools49597.2019.00007","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00007","url":null,"abstract":"As we are quickly approaching exascale and moving onwards towards the next challenge, we are exploring a wider range of technologies and architectures. The further out the timeframes considered, the less likely prototype hardware is available. A popular method of exploring new architectural extensions is to emulate them on existing platforms. The Arm Instruction Emulator (ArmIE) is such a tool, which we use on existing Armv8 platforms to run Arm's latest vector architecture, the Scalable Vector Extension (SVE). To aid with porting applications towards SVE, we developed an application optimization methodology based on ArmIE that uses timing-agnostic metrics to assess application quality. We show how we have successfully optimized the High Performance Conjugate Gradient (HPCG) High Performance Computing benchmark to SVE by using our methodology, resulting in a hand-optimized intrinsics-based version.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"934 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123780368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Designing Efficient Parallel Software via Compositional Performance Modeling 基于组合性能建模的高效并行软件设计
A. Calotoiu, Thomas Höhl, H. Mantel, Toni Nguyen, F. Wolf
{"title":"Designing Efficient Parallel Software via Compositional Performance Modeling","authors":"A. Calotoiu, Thomas Höhl, H. Mantel, Toni Nguyen, F. Wolf","doi":"10.1109/ProTools49597.2019.00008","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00008","url":null,"abstract":"Performance models are powerful instruments for understanding the performance of parallel systems and uncovering their bottlenecks. Already during system design, performance models can help ponder alternatives. However, creating a performance model - whether theoretically or empirically - for an entire application that does not exist yet is challenging unless the interactions between all system components are well understood, which is often not the case during design. In this paper, we propose to generate performance models of full programs from performance models of their components using formal composition operators derived from parallel design patterns such as pipeline or task pool. As long as the design of the overall system follows such a pattern, its performance model can be predicted with reasonable accuracy without an actual implementation.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131786663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信