Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniques

M.R. Sundara Kumar, H.S. Mohan
{"title":"Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniques","authors":"M.R. Sundara Kumar, H.S. Mohan","doi":"10.3233/jifs-240069","DOIUrl":null,"url":null,"abstract":"Big Data Analytics (BDA) is an unavoidable technique in today’s digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30–35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24–30% among cluster nodes.","PeriodicalId":509313,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-240069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Big Data Analytics (BDA) is an unavoidable technique in today’s digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30–35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24–30% among cluster nodes.
利用遗传优化技术,通过 HDFS 的映射还原调度和副本放置提高大数据分析的数据处理速度
大数据分析(BDA)是当今数字世界不可避免的一项技术,用于处理由在线和互联网来源产生的海量数字数据。这些数据被保存在存储库中,通过分布在更广泛网络中的集群节点进行数据处理。由于数据量巨大且实时生成,大数据处理面临着延迟和吞吐量方面的挑战。Hadoop 和 SPARK 等现代系统通过 HDFS、Map Reduce 和 In-Memory 分析方法来管理大量数据,但迁移成本比通常要高。基于遗传算法的优化(GABO)、地图缩减调度(MRS)和数据复制为解决这一难题提供了答案。通过遗传算法提供的多目标解决方案,资源利用率和节点可用性提高了大型数据环境的处理性能。这项工作开发了一种新颖的创造性策略,用于提高大数据分析中的数据处理性能,该策略被称为基于非支配排序遗传算法的地图缩减调度(MRSNSGA)。Hadoop-Map Reduce 范式将数据作为一个大块放置在分布式块中,并在更广泛的网络中的集群节点之间进行调度。从各种目标解决方案的结果中提取出具有高延迟和低访问时间的最佳解决方案。实验以模拟的方式进行,输入了多个不同位置的节点数据和集群机架。最后,结果表明,大数据分析中的数据处理速度比以前的方法提高了 30-35%。在集群节点中,从多目标解决方案中找到最佳解决方案的优化方法开发速度为 24-30%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信