Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniques

Journal of Intelligent & Fuzzy Systems Pub Date : 2024-03-11 DOI:10.3233/jifs-240069

M.R. Sundara Kumar, H.S. Mohan

{"title":"Improving big data analytics data processing speed through map reduce scheduling and replica placement with HDFS using genetic optimization techniques","authors":"M.R. Sundara Kumar, H.S. Mohan","doi":"10.3233/jifs-240069","DOIUrl":null,"url":null,"abstract":"Big Data Analytics (BDA) is an unavoidable technique in today’s digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30–35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24–30% among cluster nodes.","PeriodicalId":509313,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"50 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-240069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Big Data Analytics (BDA) is an unavoidable technique in today’s digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30–35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24–30% among cluster nodes.

查看原文本刊更多论文

利用遗传优化技术，通过 HDFS 的映射还原调度和副本放置提高大数据分析的数据处理速度

大数据分析（BDA）是当今数字世界不可避免的一项技术，用于处理由在线和互联网来源产生的海量数字数据。这些数据被保存在存储库中，通过分布在更广泛网络中的集群节点进行数据处理。由于数据量巨大且实时生成，大数据处理面临着延迟和吞吐量方面的挑战。Hadoop 和 SPARK 等现代系统通过 HDFS、Map Reduce 和 In-Memory 分析方法来管理大量数据，但迁移成本比通常要高。基于遗传算法的优化（GABO）、地图缩减调度（MRS）和数据复制为解决这一难题提供了答案。通过遗传算法提供的多目标解决方案，资源利用率和节点可用性提高了大型数据环境的处理性能。这项工作开发了一种新颖的创造性策略，用于提高大数据分析中的数据处理性能，该策略被称为基于非支配排序遗传算法的地图缩减调度（MRSNSGA）。Hadoop-Map Reduce 范式将数据作为一个大块放置在分布式块中，并在更广泛的网络中的集群节点之间进行调度。从各种目标解决方案的结果中提取出具有高延迟和低访问时间的最佳解决方案。实验以模拟的方式进行，输入了多个不同位置的节点数据和集群机架。最后，结果表明，大数据分析中的数据处理速度比以前的方法提高了 30-35%。在集群节点中，从多目标解决方案中找到最佳解决方案的优化方法开发速度为 24-30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Intelligent & Fuzzy Systems

自引率

0.00%

发文量