Analyzing the stability, efficiency, and cost of a dynamic data replica balancing architecture for HDFS

IF 2.2 4区 计算机科学 Q3 TELECOMMUNICATIONS
Rhauani Weber Aita Fazul, Odorico Machado Mendizabal, Patrícia Pitthan Barcelos
{"title":"Analyzing the stability, efficiency, and cost of a dynamic data replica balancing architecture for HDFS","authors":"Rhauani Weber Aita Fazul,&nbsp;Odorico Machado Mendizabal,&nbsp;Patrícia Pitthan Barcelos","doi":"10.1007/s12243-025-01093-1","DOIUrl":null,"url":null,"abstract":"<div><p>Hadoop Distributed File System (HDFS) is known for its specialized strategies and policies tailored to enhance replica placement. This capability is critical for ensuring efficient and reliable access to data replicas, particularly as HDFS operates best when data are evenly distributed within the cluster. In this paper, we build upon earlier practical evaluations and conduct a thorough analysis of the replica balancing process in HDFS, focusing on two critical performance metrics: stability and efficiency. We evaluated these aspects alongside balancing operational cost by contrasting them with conventional HDFS solutions and employing a novel dynamic architecture for data replica balancing. On top of that, we delve into the optimizations in data locality brought about by effective replica balancing and their benefits for data-intensive applications, including enhanced read performance. Our findings reveal the extent to which data imbalance in HDFS directly affects the file system and highlight the struggles of the default replica placement policy in maintaining cluster balance. We examined the real but intricate and temporary effectiveness of on-demand balancing, underscoring the importance of regular and adaptable balancing interventions. This reaffirms the significance of context-aware replica balancing, as provided by the proposed dynamic architecture, not only for maintaining data equilibrium but also for ensuring efficient system performance.</p></div>","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":"80 9-10","pages":"867 - 883"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s12243-025-01093-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Hadoop Distributed File System (HDFS) is known for its specialized strategies and policies tailored to enhance replica placement. This capability is critical for ensuring efficient and reliable access to data replicas, particularly as HDFS operates best when data are evenly distributed within the cluster. In this paper, we build upon earlier practical evaluations and conduct a thorough analysis of the replica balancing process in HDFS, focusing on two critical performance metrics: stability and efficiency. We evaluated these aspects alongside balancing operational cost by contrasting them with conventional HDFS solutions and employing a novel dynamic architecture for data replica balancing. On top of that, we delve into the optimizations in data locality brought about by effective replica balancing and their benefits for data-intensive applications, including enhanced read performance. Our findings reveal the extent to which data imbalance in HDFS directly affects the file system and highlight the struggles of the default replica placement policy in maintaining cluster balance. We examined the real but intricate and temporary effectiveness of on-demand balancing, underscoring the importance of regular and adaptable balancing interventions. This reaffirms the significance of context-aware replica balancing, as provided by the proposed dynamic architecture, not only for maintaining data equilibrium but also for ensuring efficient system performance.

Abstract Image

分析HDFS动态数据副本均衡架构的稳定性、效率和成本
Hadoop分布式文件系统(HDFS)以其专门的策略和策略而闻名,这些策略和策略专门用于增强副本的放置。这种能力对于确保高效可靠地访问数据副本至关重要,特别是当数据在集群内均匀分布时,HDFS的运行效果最好。在本文中,我们以早期的实际评估为基础,对HDFS中的副本平衡过程进行了彻底的分析,重点关注两个关键的性能指标:稳定性和效率。我们通过将它们与传统的HDFS解决方案进行对比,并采用一种新的动态架构来平衡数据副本,从而评估了这些方面以及平衡运营成本。除此之外,我们还深入研究了有效的副本平衡所带来的数据局域性优化,以及它们对数据密集型应用程序的好处,包括增强的读性能。我们的研究结果揭示了HDFS中的数据不平衡直接影响文件系统的程度,并强调了默认副本放置策略在维护集群平衡方面的斗争。我们研究了按需平衡的真实但复杂和暂时的有效性,强调了定期和适应性平衡干预的重要性。这重申了上下文感知的副本平衡的重要性,正如所提议的动态架构所提供的那样,不仅可以维护数据平衡,还可以确保有效的系统性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Telecommunications
Annals of Telecommunications 工程技术-电信学
CiteScore
5.20
自引率
5.30%
发文量
37
审稿时长
4.5 months
期刊介绍: Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信