Gauge:用于HPC应用程序I/O性能分析的交互式数据驱动可视化工具

Eliakin del Rosario, Mikaela Currier, Mihailo Isakov, S. Madireddy, Prasanna Balaprakash, P. Carns, R. Ross, K. Harms, S. Snyder, M. Kinsy
{"title":"Gauge:用于HPC应用程序I/O性能分析的交互式数据驱动可视化工具","authors":"Eliakin del Rosario, Mikaela Currier, Mihailo Isakov, S. Madireddy, Prasanna Balaprakash, P. Carns, R. Ross, K. Harms, S. Snyder, M. Kinsy","doi":"10.1109/PDSW51947.2020.00008","DOIUrl":null,"url":null,"abstract":"Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs executed on the system, the lack of tooling makes debugging I/O problems difficult. In this work, we introduce Gauge, an interactive, data-driven, web-based visualization tool for HPC I/O performance analysis. Gauge aids in the process of visualizing and analyzing, in an interactive fashion, large sets of HPC application execution logs. It performs a number of functions met to significantly reduce the cognitive load of navigating these sets - some worth many years of HPC logs. For instance, as its first step in many processing chains, it arranges unordered sets of collected HPC logs into a hierarchy of clusters for later analysis. This clustering step allows application developers to quickly navigate logs, find how their jobs compare to those of their peers in terms of I/O utilization, as well as how to improve their future runs. Similarly, facility operators can use Gauge to ‘get a pulse’ on the workloads running on their HPC systems, find clusters of under performing applications, and diagnose the reason for poor I/O throughput. In this work, we describe how Gauge arrives at the HPC jobs clustering, how it presents data about the jobs, and how it can be used to further narrow down and understand behavior of sets of jobs. We also provide a case study on using Gauge from the perspective of a facility operator.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"35 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Gauge: An Interactive Data-Driven Visualization Tool for HPC Application I/O Performance Analysis\",\"authors\":\"Eliakin del Rosario, Mikaela Currier, Mihailo Isakov, S. Madireddy, Prasanna Balaprakash, P. Carns, R. Ross, K. Harms, S. Snyder, M. Kinsy\",\"doi\":\"10.1109/PDSW51947.2020.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs executed on the system, the lack of tooling makes debugging I/O problems difficult. In this work, we introduce Gauge, an interactive, data-driven, web-based visualization tool for HPC I/O performance analysis. Gauge aids in the process of visualizing and analyzing, in an interactive fashion, large sets of HPC application execution logs. It performs a number of functions met to significantly reduce the cognitive load of navigating these sets - some worth many years of HPC logs. For instance, as its first step in many processing chains, it arranges unordered sets of collected HPC logs into a hierarchy of clusters for later analysis. This clustering step allows application developers to quickly navigate logs, find how their jobs compare to those of their peers in terms of I/O utilization, as well as how to improve their future runs. Similarly, facility operators can use Gauge to ‘get a pulse’ on the workloads running on their HPC systems, find clusters of under performing applications, and diagnose the reason for poor I/O throughput. In this work, we describe how Gauge arrives at the HPC jobs clustering, how it presents data about the jobs, and how it can be used to further narrow down and understand behavior of sets of jobs. We also provide a case study on using Gauge from the perspective of a facility operator.\",\"PeriodicalId\":142923,\"journal\":{\"name\":\"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)\",\"volume\":\"35 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDSW51947.2020.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW51947.2020.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

由于HPC I/O子系统的复杂性和多层性,理解和缓解HPC系统工作负载中的I/O瓶颈是很困难的。即使完全了解系统上执行的作业,工具的缺乏也会使调试I/O问题变得困难。在这项工作中,我们介绍了Gauge,一个用于HPC I/O性能分析的交互式、数据驱动的、基于web的可视化工具。Gauge有助于以交互方式可视化和分析大量HPC应用程序执行日志。它执行了许多功能,以显著减少导航这些集合的认知负荷,其中一些功能值得多年的HPC日志。例如,作为许多处理链中的第一步,它将收集到的HPC日志的无序集安排到集群的层次结构中,以供以后分析。这个集群步骤允许应用程序开发人员快速浏览日志,查找他们的作业在I/O利用率方面与同类作业的比较,以及如何改进未来的运行。类似地,设施运营商可以使用Gauge来“了解”运行在HPC系统上的工作负载,找到性能不佳的应用程序集群,并诊断I/O吞吐量差的原因。在这项工作中,我们描述了Gauge如何到达HPC作业集群,它如何呈现有关作业的数据,以及如何使用它进一步缩小和理解作业集的行为。我们还提供了一个从设施运营商的角度使用Gauge的案例研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gauge: An Interactive Data-Driven Visualization Tool for HPC Application I/O Performance Analysis
Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs executed on the system, the lack of tooling makes debugging I/O problems difficult. In this work, we introduce Gauge, an interactive, data-driven, web-based visualization tool for HPC I/O performance analysis. Gauge aids in the process of visualizing and analyzing, in an interactive fashion, large sets of HPC application execution logs. It performs a number of functions met to significantly reduce the cognitive load of navigating these sets - some worth many years of HPC logs. For instance, as its first step in many processing chains, it arranges unordered sets of collected HPC logs into a hierarchy of clusters for later analysis. This clustering step allows application developers to quickly navigate logs, find how their jobs compare to those of their peers in terms of I/O utilization, as well as how to improve their future runs. Similarly, facility operators can use Gauge to ‘get a pulse’ on the workloads running on their HPC systems, find clusters of under performing applications, and diagnose the reason for poor I/O throughput. In this work, we describe how Gauge arrives at the HPC jobs clustering, how it presents data about the jobs, and how it can be used to further narrow down and understand behavior of sets of jobs. We also provide a case study on using Gauge from the perspective of a facility operator.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信