DARB: A Dynamic Architecture for Data Replica Balancing

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Rhauani Weber Aita Fazul, Odorico Machado Mendizabal, Patrícia Pitthan Barcelos
{"title":"DARB: A Dynamic Architecture for Data Replica Balancing","authors":"Rhauani Weber Aita Fazul,&nbsp;Odorico Machado Mendizabal,&nbsp;Patrícia Pitthan Barcelos","doi":"10.1002/cpe.70050","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Distributed file systems, such as HDFS, are designed to support applications that handle large volumes of data. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance. As new data are loaded into the system, the distribution of data blocks replicated among the nodes may become dissimilar affecting replica balancing and data locality. The HDFS Balancer is the official solution for redistributing the data already stored in the cluster. However, it overlooks the specific needs of the applications during data rearrangement and requires manual intervention by system administrators—a dependency that is often inadequate and inefficient. To address these limitations, this work presents DARB, a Dynamic Architecture for Replica Balancing that combines reactive and proactive strategies. The former uses the Prioritized Replica Balancing Policy to customize the replica balancing through configurable priorities. The latter consists of an event-driven strategy that makes the overall balancing process in HDFS transparent. DARB comprises modular components and a metrics observation model that identifies and determines when corrective actions should be taken. It also automatically triggers the HDFS Balancer based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and execution while actively acting to keep the cluster balanced, taking into account performance, reliability, and data availability perspectives. Thus, DARB offers a sophisticated and specialized balancing solution that makes the balancing process seamless and flexible, introducing to the HDFS the concept of context-aware replica balancing.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70050","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed file systems, such as HDFS, are designed to support applications that handle large volumes of data. Data replication, which is at the core of the HDFS storage model, is essential for fault tolerance and performance. As new data are loaded into the system, the distribution of data blocks replicated among the nodes may become dissimilar affecting replica balancing and data locality. The HDFS Balancer is the official solution for redistributing the data already stored in the cluster. However, it overlooks the specific needs of the applications during data rearrangement and requires manual intervention by system administrators—a dependency that is often inadequate and inefficient. To address these limitations, this work presents DARB, a Dynamic Architecture for Replica Balancing that combines reactive and proactive strategies. The former uses the Prioritized Replica Balancing Policy to customize the replica balancing through configurable priorities. The latter consists of an event-driven strategy that makes the overall balancing process in HDFS transparent. DARB comprises modular components and a metrics observation model that identifies and determines when corrective actions should be taken. It also automatically triggers the HDFS Balancer based on standardized trigger events. The evaluation results reinforce that the proposed solution removes the need for manual configuration and execution while actively acting to keep the cluster balanced, taking into account performance, reliability, and data availability perspectives. Thus, DARB offers a sophisticated and specialized balancing solution that makes the balancing process seamless and flexible, introducing to the HDFS the concept of context-aware replica balancing.

DARB:数据复制平衡动态架构
分布式文件系统,如HDFS,是为支持处理大量数据的应用程序而设计的。数据复制是HDFS存储模型的核心,对容错和性能至关重要。随着新数据加载到系统中,节点间复制的数据块分布可能会变得不同,从而影响副本的平衡和数据的局部性。HDFS Balancer是重新分配已经存储在集群中的数据的官方解决方案。但是,它忽略了数据重排过程中应用程序的特定需求,并且需要系统管理员进行手动干预——这种依赖关系通常是不充分和低效的。为了解决这些限制,本工作提出了DARB,一种结合了被动和主动策略的副本平衡动态架构。前者使用优先级副本均衡策略,通过可配置的优先级来定制副本均衡。后者由事件驱动的策略组成,使HDFS中的整体平衡过程透明。DARB由模块化组件和一个度量观察模型组成,该模型识别并决定何时应该采取纠正措施。它还可以根据标准化的触发事件自动触发HDFS Balancer。评估结果表明,建议的解决方案不需要手动配置和执行,同时考虑到性能、可靠性和数据可用性,积极地保持集群平衡。因此,DARB提供了一个复杂和专业的平衡解决方案,使平衡过程无缝和灵活,向HDFS引入了上下文感知副本平衡的概念。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信