2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)最新文献

CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance CHAMPVis:微架构性能的比较层次分析

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00013

Lillian Pentecost, Udit Gupta, Elisa Ngan, J. Beyer, Gu-Yeon Wei, D. Brooks, M. Behrisch

{"title":"CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance","authors":"Lillian Pentecost, Udit Gupta, Elisa Ngan, J. Beyer, Gu-Yeon Wei, D. Brooks, M. Behrisch","doi":"10.1109/ProTools49597.2019.00013","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00013","url":null,"abstract":"Performance analysis and optimization are essential tasks for hardware and software engineers. In the age of datacenter-scale computing, it is particularly important to conduct comparative performance analysis to understand discrepancies and limitations among different hardware systems and applications. However, there is a distinct lack of productive visualization tools for these comparisons. We present CHAMPVis, a web-based, interactive visualization tool that leverages the hierarchical organization of hardware systems to enable productive performance analysis. With CHAMPVis, users can make definitive performance comparisons across applications or hardware platforms. In addition, CHAMPVis provides methods to rank and cluster based on performance metrics to identify common optimization opportunities. Our thorough task analysis reveals three types of datacenter-scale performance analysis tasks: summarization, detailed comparative analysis, and interactive performance bottleneck identification. We propose techniques for each class of tasks including (1) 1-D feature space projection for similarity analysis; (2) Hierarchical parallel co-ordinates for comparative analysis; and (3) User interactions for rapid diagnostic queries to identify optimization targets. We evaluate CHAMPVis by analyzing standard datacenter applications and machine learning benchmarks in two different case studies.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115181967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

In Situ Visualization of Performance Metrics in Multiple Domains 多领域性能指标的现场可视化

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00014

Allen R. Sanderson, John A. Schmidt, A. Humphrey, M. Papka, R. Sisneros

引用次数: 1

[Copyright notice] (版权)

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/protools49597.2019.00002

引用次数: 0

[Title page] (标题页)

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/protools49597.2019.00001

引用次数: 0

Multi-Level Performance Instrumentation for Kokkos Applications Using TAU 使用TAU的Kokkos应用程序的多级性能仪器

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00012

S. Shende, Nicholas Chaimov, A. Malony, N. Imam

引用次数: 5

Automatic Instrumentation Refinement for Empirical Performance Modeling 经验性能建模的自动仪表改进

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00011

Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf

{"title":"Automatic Instrumentation Refinement for Empirical Performance Modeling","authors":"Jan-Patrick Lehr, A. Calotoiu, C. Bischof, F. Wolf","doi":"10.1109/ProTools49597.2019.00011","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00011","url":null,"abstract":"The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e.,such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools 使用PaRSEC仪器工具的Tile低秩Cholesky分解性能分析

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00009

Qinglei Cao, Yu Pei, T. Hérault, Kadir Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra

{"title":"Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools","authors":"Qinglei Cao, Yu Pei, T. Hérault, Kadir Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, J. Dongarra","doi":"10.1109/ProTools49597.2019.00009","DOIUrl":"https://doi.org/10.1109/ProTools49597.2019.00009","url":null,"abstract":"This paper highlights the necessary development of new instrumentation tools within the PaRSE task-based runtime system to leverage the performance of low-rank matrix computations. In particular, the tile low-rank (TLR) Cholesky factorization represents one of the most critical matrix operations toward solving challenging large-scale scientific applications. The challenge resides in the heterogeneous arithmetic intensity of the various computational kernels, which stresses PaRSE's dynamic engine when orchestrating the task executions at runtime. Such irregular workload imposes the deployment of new scheduling heuristics to privilege the critical path, while exposing task parallelism to maximize hardware occupancy. To measure the effectiveness of PaRSE's engine and its various scheduling strategies for tackling such workloads, it becomes paramount to implement adequate performance analysis and profiling tools tailored to fine-grained and heterogeneous task execution. This permits us not only to provide insights from PaRSE, but also to identify potential applications' performance bottlenecks. These instrumentation tools may actually foster synergism between applications and PaRSE developers for productivity as well as high-performance computing purposes. We demonstrate the benefits of these amenable tools, while assessing the performance of TLR Cholesky factorization from data distribution, communication-reducing and synchronization-reducing perspectives. This tool-assisted performance analysis results in three major contributions: a new hybrid data distribution, a new hierarchical TLR Cholesky algorithm, and a new performance model for tuning the tile size. The new TLR Cholesky factorization achieves an 8X performance speedup over existing implementations on massively parallel supercomputers, toward solving large-scale 3D climate and weather prediction applications.","PeriodicalId":418029,"journal":{"name":"2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122264437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

The Case for a Common Instrumentation Interface for HPC Codes HPC代码通用仪表接口的案例

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00010

David Boehme, K. Huck, Jonathan Madsen, J. Weidendorfer

引用次数: 8

Asvie: A Timing-Agnostic SVE Optimization Methodology 一种时间不可知的SVE优化方法

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00007

M. T. Cruz, Daniel Ruiz, Roxana Rusitoru

引用次数: 4

Designing Efficient Parallel Software via Compositional Performance Modeling 基于组合性能建模的高效并行软件设计

2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2019-11-01 DOI: 10.1109/ProTools49597.2019.00008

A. Calotoiu, Thomas Höhl, H. Mantel, Toni Nguyen, F. Wolf

引用次数: 1