高性能计算应用时空I/O突发性的定量研究

Wenxiang Yang, Xiangke Liao, Dezun Dong, Jie Yu
{"title":"高性能计算应用时空I/O突发性的定量研究","authors":"Wenxiang Yang, Xiangke Liao, Dezun Dong, Jie Yu","doi":"10.1109/ipdps53621.2022.00133","DOIUrl":null,"url":null,"abstract":"Understanding the I/O characteristics of applications on supercomputers is crucial to paving the path for application optimization and system resource allocation. We collect and analyze I/O traces of applications on a production supercomputer and reconfirm that I/O bursts exist in most applications. What's more, we find that the I/O bursts not only occur in short periods of time but also originate from a minority of adjacent compute nodes allocated to the applications, which we call spatiotemporal I/O burstiness. The concentration of I/O traffic in both time and space dimension will make applications experience poor I/O performance and incur I/O inefficiency of the storage system. Although there are some solutions, such as burst buffer, can help alleviate such inefficiency, there is still no work that measures, analyzes and further predicts the application I/O characteristic in terms of spatiotemporal burstiness, which we think is vital for application-aware optimizations, including but not limited to burst buffer allocation and job scheduling. In this paper, we first propose a mathematical model to measure the spatiotemporal I/O burstiness. Then a thorough analysis on the spatiotemporal I/O characteristic of all applications on the system is elaborated. We further make use of the job's submitting path to explore the I/O characteristic similarity among jobs, based on which a machine learning classification algorithm is proposed to accurately predict the job spatiotemporal I/O burstiness in advance. With accurate job I/O characteristic at hand, some useful suggestions are put forward to hedge the impacts of the spatiotemporal I/O burstiness.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application\",\"authors\":\"Wenxiang Yang, Xiangke Liao, Dezun Dong, Jie Yu\",\"doi\":\"10.1109/ipdps53621.2022.00133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the I/O characteristics of applications on supercomputers is crucial to paving the path for application optimization and system resource allocation. We collect and analyze I/O traces of applications on a production supercomputer and reconfirm that I/O bursts exist in most applications. What's more, we find that the I/O bursts not only occur in short periods of time but also originate from a minority of adjacent compute nodes allocated to the applications, which we call spatiotemporal I/O burstiness. The concentration of I/O traffic in both time and space dimension will make applications experience poor I/O performance and incur I/O inefficiency of the storage system. Although there are some solutions, such as burst buffer, can help alleviate such inefficiency, there is still no work that measures, analyzes and further predicts the application I/O characteristic in terms of spatiotemporal burstiness, which we think is vital for application-aware optimizations, including but not limited to burst buffer allocation and job scheduling. In this paper, we first propose a mathematical model to measure the spatiotemporal I/O burstiness. Then a thorough analysis on the spatiotemporal I/O characteristic of all applications on the system is elaborated. We further make use of the job's submitting path to explore the I/O characteristic similarity among jobs, based on which a machine learning classification algorithm is proposed to accurately predict the job spatiotemporal I/O burstiness in advance. With accurate job I/O characteristic at hand, some useful suggestions are put forward to hedge the impacts of the spatiotemporal I/O burstiness.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

理解超级计算机上应用程序的I/O特征对于为应用程序优化和系统资源分配铺平道路至关重要。我们收集和分析了生产超级计算机上应用程序的I/O跟踪,并再次确认在大多数应用程序中存在I/O突发。更重要的是,我们发现I/O突发不仅发生在短时间内,而且还源于分配给应用程序的少数相邻计算节点,我们称之为时空I/O突发。I/O流量在时间和空间两个维度上的集中会导致应用程序的I/O性能下降,导致存储系统的I/O效率低下。虽然有一些解决方案,如突发缓冲区,可以帮助缓解这种低效率,但仍然没有工作可以测量,分析和进一步预测应用程序在时空突发方面的I/O特性,我们认为这对应用程序感知优化至关重要,包括但不限于突发缓冲区分配和作业调度。在本文中,我们首先提出了一个测量时空I/O突发的数学模型。然后详细分析了系统中所有应用程序的时空I/O特性。我们进一步利用作业的提交路径来探索作业之间的I/O特征相似性,在此基础上提出了一种机器学习分类算法来提前准确预测作业的时空I/O突发性。在得到准确的作业I/O特性的基础上,提出了一些有用的建议来对冲时空I/O突发的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application
Understanding the I/O characteristics of applications on supercomputers is crucial to paving the path for application optimization and system resource allocation. We collect and analyze I/O traces of applications on a production supercomputer and reconfirm that I/O bursts exist in most applications. What's more, we find that the I/O bursts not only occur in short periods of time but also originate from a minority of adjacent compute nodes allocated to the applications, which we call spatiotemporal I/O burstiness. The concentration of I/O traffic in both time and space dimension will make applications experience poor I/O performance and incur I/O inefficiency of the storage system. Although there are some solutions, such as burst buffer, can help alleviate such inefficiency, there is still no work that measures, analyzes and further predicts the application I/O characteristic in terms of spatiotemporal burstiness, which we think is vital for application-aware optimizations, including but not limited to burst buffer allocation and job scheduling. In this paper, we first propose a mathematical model to measure the spatiotemporal I/O burstiness. Then a thorough analysis on the spatiotemporal I/O characteristic of all applications on the system is elaborated. We further make use of the job's submitting path to explore the I/O characteristic similarity among jobs, based on which a machine learning classification algorithm is proposed to accurately predict the job spatiotemporal I/O burstiness in advance. With accurate job I/O characteristic at hand, some useful suggestions are put forward to hedge the impacts of the spatiotemporal I/O burstiness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信