Rhauani Weber Aita Fazul, Odorico Machado Mendizabal, Patrícia Pitthan Barcelos
{"title":"Analyzing the stability, efficiency, and cost of a dynamic data replica balancing architecture for HDFS","authors":"Rhauani Weber Aita Fazul, Odorico Machado Mendizabal, Patrícia Pitthan Barcelos","doi":"10.1007/s12243-025-01093-1","DOIUrl":null,"url":null,"abstract":"<div><p>Hadoop Distributed File System (HDFS) is known for its specialized strategies and policies tailored to enhance replica placement. This capability is critical for ensuring efficient and reliable access to data replicas, particularly as HDFS operates best when data are evenly distributed within the cluster. In this paper, we build upon earlier practical evaluations and conduct a thorough analysis of the replica balancing process in HDFS, focusing on two critical performance metrics: stability and efficiency. We evaluated these aspects alongside balancing operational cost by contrasting them with conventional HDFS solutions and employing a novel dynamic architecture for data replica balancing. On top of that, we delve into the optimizations in data locality brought about by effective replica balancing and their benefits for data-intensive applications, including enhanced read performance. Our findings reveal the extent to which data imbalance in HDFS directly affects the file system and highlight the struggles of the default replica placement policy in maintaining cluster balance. We examined the real but intricate and temporary effectiveness of on-demand balancing, underscoring the importance of regular and adaptable balancing interventions. This reaffirms the significance of context-aware replica balancing, as provided by the proposed dynamic architecture, not only for maintaining data equilibrium but also for ensuring efficient system performance.</p></div>","PeriodicalId":50761,"journal":{"name":"Annals of Telecommunications","volume":"80 9-10","pages":"867 - 883"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Telecommunications","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s12243-025-01093-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Hadoop Distributed File System (HDFS) is known for its specialized strategies and policies tailored to enhance replica placement. This capability is critical for ensuring efficient and reliable access to data replicas, particularly as HDFS operates best when data are evenly distributed within the cluster. In this paper, we build upon earlier practical evaluations and conduct a thorough analysis of the replica balancing process in HDFS, focusing on two critical performance metrics: stability and efficiency. We evaluated these aspects alongside balancing operational cost by contrasting them with conventional HDFS solutions and employing a novel dynamic architecture for data replica balancing. On top of that, we delve into the optimizations in data locality brought about by effective replica balancing and their benefits for data-intensive applications, including enhanced read performance. Our findings reveal the extent to which data imbalance in HDFS directly affects the file system and highlight the struggles of the default replica placement policy in maintaining cluster balance. We examined the real but intricate and temporary effectiveness of on-demand balancing, underscoring the importance of regular and adaptable balancing interventions. This reaffirms the significance of context-aware replica balancing, as provided by the proposed dynamic architecture, not only for maintaining data equilibrium but also for ensuring efficient system performance.
期刊介绍:
Annals of Telecommunications is an international journal publishing original peer-reviewed papers in the field of telecommunications. It covers all the essential branches of modern telecommunications, ranging from digital communications to communication networks and the internet, to software, protocols and services, uses and economics. This large spectrum of topics accounts for the rapid convergence through telecommunications of the underlying technologies in computers, communications, content management towards the emergence of the information and knowledge society. As a consequence, the Journal provides a medium for exchanging research results and technological achievements accomplished by the European and international scientific community from academia and industry.