通过智能线程级聚合防止百亿亿级配置文件数据的爆炸

ESPT '15 Pub Date : 2015-11-15 DOI:10.1145/2832106.2832107
Daniel Lorenz, Sergei Shudler, F. Wolf
{"title":"通过智能线程级聚合防止百亿亿级配置文件数据的爆炸","authors":"Daniel Lorenz, Sergei Shudler, F. Wolf","doi":"10.1145/2832106.2832107","DOIUrl":null,"url":null,"abstract":"State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time.","PeriodicalId":424753,"journal":{"name":"ESPT '15","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Preventing the explosion of exascale profile data with smart thread-level aggregation\",\"authors\":\"Daniel Lorenz, Sergei Shudler, F. Wolf\",\"doi\":\"10.1145/2832106.2832107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time.\",\"PeriodicalId\":424753,\"journal\":{\"name\":\"ESPT '15\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESPT '15\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2832106.2832107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESPT '15","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2832106.2832107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

最先进的性能分析工具,如Score-P,以每个线程为基础记录性能配置文件。然而,对于百亿亿级系统,线程的数量预计将达到10亿个线程,这将导致非常大的性能配置文件。在大多数情况下,用户几乎从不检查每个线程的数据。在本文中,我们建议聚合每个进程中的每线程性能数据,以将其数量减少到合理的大小。我们的目标是聚合线程,这样线程级别的性能问题仍然是可见和可分析的。因此,我们在Score-P中实现了四种聚合策略:(i) SUM——将一个流程的所有线程聚合到一个流程配置文件中;(ii) SET——计算统计关键数据和总和;(iii) KEY——识别性能分析特别感兴趣的三个线程(即关键线程),并汇总其余线程;(iv) CALLTREE——具有相同调用树结构的线程集群。对于这些策略中的每一种,我们都会评估压缩比以及它们如何维护线程级性能行为信息。聚合不会在应用程序运行时产生任何额外的性能开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Preventing the explosion of exascale profile data with smart thread-level aggregation
State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信