LDPP: A Learned Directory Placement Policy in Distributed File Systems

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545057

Yuanzhang Wang, Fengkui Yang, Ji Zhang, Chun-hua Li, Ke Zhou, Chong Liu, Zhuo Cheng, Wei Fang, Jinhu Liu

{"title":"LDPP: A Learned Directory Placement Policy in Distributed File Systems","authors":"Yuanzhang Wang, Fengkui Yang, Ji Zhang, Chun-hua Li, Ke Zhou, Chong Liu, Zhuo Cheng, Wei Fang, Jinhu Liu","doi":"10.1145/3545008.3545057","DOIUrl":null,"url":null,"abstract":"Load balance is a critical problem in distributed file systems. Previous works focus on how to distribute data evenly on different nodes or storage devices from the perspective of file level, but neglect to effectively take advantage of the directory’s locality and the long duration of the directory’s hotness, which may affect the degree of balance and cause performance degradation. To overcome this shortcoming, in this paper, we propose a learning-based directory placement policy, called LDPP, which determines the data layout by predicting the load. We first establish a relationship between directory request characteristics and state information to predict the state information of the directory (storage capacity, bandwidth, and IOPS). Then, the new directory is placed on different nodes in a multi-dimensional manner based on the Manhattan distance according to the predicted multidimensional state information. In addition, we also take into account the trade-off between the same category directory classified by the load prediction module and the peer directories and explore their influence on the balance. Extensive experiments demonstrate that LDPP not only efficiently alleviates load imbalance and increases the utilization of the resources but also improves DFS performance in practice, which can reduce service latency by up to 36 and increase IOPS and bandwidth by 8 and 9, respectively.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Load balance is a critical problem in distributed file systems. Previous works focus on how to distribute data evenly on different nodes or storage devices from the perspective of file level, but neglect to effectively take advantage of the directory’s locality and the long duration of the directory’s hotness, which may affect the degree of balance and cause performance degradation. To overcome this shortcoming, in this paper, we propose a learning-based directory placement policy, called LDPP, which determines the data layout by predicting the load. We first establish a relationship between directory request characteristics and state information to predict the state information of the directory (storage capacity, bandwidth, and IOPS). Then, the new directory is placed on different nodes in a multi-dimensional manner based on the Manhattan distance according to the predicted multidimensional state information. In addition, we also take into account the trade-off between the same category directory classified by the load prediction module and the peer directories and explore their influence on the balance. Extensive experiments demonstrate that LDPP not only efficiently alleviates load imbalance and increases the utilization of the resources but also improves DFS performance in practice, which can reduce service latency by up to 36 and increase IOPS and bandwidth by 8 and 9, respectively.

查看原文本刊更多论文

LDPP:分布式文件系统中的学习目录放置策略

负载平衡是分布式文件系统中的一个关键问题。以往的工作主要是从文件级的角度考虑如何将数据均匀地分布在不同的节点或存储设备上，而忽略了有效地利用目录的局部性和目录热度持续时间长的特点，这可能会影响均衡程度，导致性能下降。为了克服这一缺点，本文提出了一种基于学习的目录放置策略，称为LDPP，它通过预测负载来确定数据布局。我们首先建立目录请求特征与状态信息之间的关系，预测目录的状态信息(存储容量、带宽和IOPS)。然后，根据预测的多维状态信息，基于曼哈顿距离，以多维方式将新目录放置在不同的节点上。此外，我们还考虑了负载预测模块分类的同类目录与对等目录之间的权衡，并探讨了它们对平衡的影响。大量实验表明，LDPP不仅有效地缓解了负载不均衡，提高了资源利用率，而且在实践中提高了DFS的性能，可以将业务延迟减少36%，IOPS和带宽分别提高8和9。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量