{"title":"通过应用集群对配备突发缓冲区的高性能计算的动态 I/O 请求进行概率调度","authors":"Benbo Zha, Hong Shen","doi":"10.1002/cpe.8142","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Burst-buffering is a promising storage solution that introduces an intermediate high-throughput storage buffer layer to mitigate the I/O bottleneck problem that the current high-performance computing (HPC) platforms suffer. The existing Markov-Chain based probabilistic I/O scheduling utilizes the load state of burst-buffers and the periodic characteristics of applications to reduce I/O congestion due to the limited capacity of burst-buffers. However, this probabilistic approach requires consistent I/O characteristics of applications, including similar I/O duration and long application length, in order to obtain an accurate I/O load estimation. These consistency conditions do not often hold in realistic situations. In this paper, we propose a generic framework of dynamic probabilistic I/O scheduling based on application clustering (DPSAC) to make applications meet the consistency requirements. According to the I/O phase length of each application, our scheme first deploys a one-dimensional K-means clustering algorithm to cluster the applications into clusters. Next, it calculates the expected workload of each cluster through the probabilistic model of applications and then partitions the burst-buffers proportionally. Then, to handle dynamic changes (join and exit) of applications, it updates the clusters based on a heuristic strategy. Finally, it applies the probabilistic I/O scheduling, which is based on the distribution of application workload and the state of burst-buffers, to schedule I/O for all the concurrent applications to mitigate I/O congestion. The simulation results on synthetic data show that our DPSAC is effective and efficient.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 19","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Probabilistic scheduling of dynamic I/O requests via application clustering for burst-buffers equipped high-performance computing\",\"authors\":\"Benbo Zha, Hong Shen\",\"doi\":\"10.1002/cpe.8142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Burst-buffering is a promising storage solution that introduces an intermediate high-throughput storage buffer layer to mitigate the I/O bottleneck problem that the current high-performance computing (HPC) platforms suffer. The existing Markov-Chain based probabilistic I/O scheduling utilizes the load state of burst-buffers and the periodic characteristics of applications to reduce I/O congestion due to the limited capacity of burst-buffers. However, this probabilistic approach requires consistent I/O characteristics of applications, including similar I/O duration and long application length, in order to obtain an accurate I/O load estimation. These consistency conditions do not often hold in realistic situations. In this paper, we propose a generic framework of dynamic probabilistic I/O scheduling based on application clustering (DPSAC) to make applications meet the consistency requirements. According to the I/O phase length of each application, our scheme first deploys a one-dimensional K-means clustering algorithm to cluster the applications into clusters. Next, it calculates the expected workload of each cluster through the probabilistic model of applications and then partitions the burst-buffers proportionally. Then, to handle dynamic changes (join and exit) of applications, it updates the clusters based on a heuristic strategy. Finally, it applies the probabilistic I/O scheduling, which is based on the distribution of application workload and the state of burst-buffers, to schedule I/O for all the concurrent applications to mitigate I/O congestion. The simulation results on synthetic data show that our DPSAC is effective and efficient.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"36 19\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8142\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8142","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Probabilistic scheduling of dynamic I/O requests via application clustering for burst-buffers equipped high-performance computing
Burst-buffering is a promising storage solution that introduces an intermediate high-throughput storage buffer layer to mitigate the I/O bottleneck problem that the current high-performance computing (HPC) platforms suffer. The existing Markov-Chain based probabilistic I/O scheduling utilizes the load state of burst-buffers and the periodic characteristics of applications to reduce I/O congestion due to the limited capacity of burst-buffers. However, this probabilistic approach requires consistent I/O characteristics of applications, including similar I/O duration and long application length, in order to obtain an accurate I/O load estimation. These consistency conditions do not often hold in realistic situations. In this paper, we propose a generic framework of dynamic probabilistic I/O scheduling based on application clustering (DPSAC) to make applications meet the consistency requirements. According to the I/O phase length of each application, our scheme first deploys a one-dimensional K-means clustering algorithm to cluster the applications into clusters. Next, it calculates the expected workload of each cluster through the probabilistic model of applications and then partitions the burst-buffers proportionally. Then, to handle dynamic changes (join and exit) of applications, it updates the clusters based on a heuristic strategy. Finally, it applies the probabilistic I/O scheduling, which is based on the distribution of application workload and the state of burst-buffers, to schedule I/O for all the concurrent applications to mitigate I/O congestion. The simulation results on synthetic data show that our DPSAC is effective and efficient.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.