Handling Data and Model Drift for World Application using Big Data

2022 International Conference on Knowledge Engineering and Communication Systems (ICKES) Pub Date : 2022-12-28 DOI:10.1109/ICKECS56523.2022.10060693

Rajesh Singh, A. Gehlot, Finney Daniel Shadrach, S. Prabu, R. Nirmalan, V. Sunil Kumar

{"title":"Handling Data and Model Drift for World Application using Big Data","authors":"Rajesh Singh, A. Gehlot, Finney Daniel Shadrach, S. Prabu, R. Nirmalan, V. Sunil Kumar","doi":"10.1109/ICKECS56523.2022.10060693","DOIUrl":null,"url":null,"abstract":"It's still unclear how to effectively extract the information concealed inside vast and massive amounts of data. The problem of “idea drift” in stream data flows is one of the difficulties. Random walk is a common issue in data analytics where the statistical features of the characteristics and the categories they are intended for change with time, decreasing the accuracy of the trained model. There are numerous approaches that have been put out for bulk data mining. A new generation of data mining methods called stream mining updates the model in a single pass whenever fresh data is received. Because of its inherent adaptability, this one-pass mechanism may be properly capable than its predecessors of coping with idea drift in data streams. In this study, we assess a group of algorithms for mining the data streams using decision trees. The collection of rules which can be derived from the induced model is the benefit of decision tree learning. The predicate logics that represent the extracted rules can then be applied in a variety of decision-supporting applications. Even in the face of concept drift, the induced decision tree has to be precise and condensed. When dealing with concept-drift data, we evaluate how well three common incremental decision tree methods (random forest, isolation forest, and forest tree) perform. In the experiment, drift data from both synthetic and actual environments are employed. It is discovered that optimization method with big data technique i.e., MapReduce yields better outcomes.","PeriodicalId":171432,"journal":{"name":"2022 International Conference on Knowledge Engineering and Communication Systems (ICKES)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Knowledge Engineering and Communication Systems (ICKES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKECS56523.2022.10060693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

It's still unclear how to effectively extract the information concealed inside vast and massive amounts of data. The problem of “idea drift” in stream data flows is one of the difficulties. Random walk is a common issue in data analytics where the statistical features of the characteristics and the categories they are intended for change with time, decreasing the accuracy of the trained model. There are numerous approaches that have been put out for bulk data mining. A new generation of data mining methods called stream mining updates the model in a single pass whenever fresh data is received. Because of its inherent adaptability, this one-pass mechanism may be properly capable than its predecessors of coping with idea drift in data streams. In this study, we assess a group of algorithms for mining the data streams using decision trees. The collection of rules which can be derived from the induced model is the benefit of decision tree learning. The predicate logics that represent the extracted rules can then be applied in a variety of decision-supporting applications. Even in the face of concept drift, the induced decision tree has to be precise and condensed. When dealing with concept-drift data, we evaluate how well three common incremental decision tree methods (random forest, isolation forest, and forest tree) perform. In the experiment, drift data from both synthetic and actual environments are employed. It is discovered that optimization method with big data technique i.e., MapReduce yields better outcomes.

查看原文本刊更多论文

大数据在全球应用中的数据处理与模型漂移

如何有效地提取隐藏在海量数据中的信息，目前还不清楚。数据流中的“思想漂移”问题是难点之一。随机游走是数据分析中的一个常见问题，其中特征的统计特征和它们打算用于的类别随着时间的变化而变化，从而降低了训练模型的准确性。对于批量数据挖掘，已经提出了许多方法。新一代数据挖掘方法称为流挖掘，每当接收到新数据时，就会在一次传递中更新模型。由于其固有的适应性，这种一次通过机制可能比其前身更适合处理数据流中的想法漂移。在这项研究中，我们评估了一组使用决策树挖掘数据流的算法。从诱导模型中得到的规则集合是决策树学习的优点。然后，可以将表示提取规则的谓词逻辑应用于各种支持决策的应用程序。即使面对概念漂移，诱导决策树也必须是精确和浓缩的。在处理概念漂移数据时，我们评估了三种常见的增量决策树方法(随机森林、隔离森林和森林树)的性能。实验中采用了合成环境和实际环境的漂移数据。研究发现，基于大数据技术的优化方法(即MapReduce)可以获得更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Knowledge Engineering and Communication Systems (ICKES)

自引率

0.00%

发文量