Rajesh Singh, A. Gehlot, Finney Daniel Shadrach, S. Prabu, R. Nirmalan, V. Sunil Kumar
{"title":"Handling Data and Model Drift for World Application using Big Data","authors":"Rajesh Singh, A. Gehlot, Finney Daniel Shadrach, S. Prabu, R. Nirmalan, V. Sunil Kumar","doi":"10.1109/ICKECS56523.2022.10060693","DOIUrl":null,"url":null,"abstract":"It's still unclear how to effectively extract the information concealed inside vast and massive amounts of data. The problem of “idea drift” in stream data flows is one of the difficulties. Random walk is a common issue in data analytics where the statistical features of the characteristics and the categories they are intended for change with time, decreasing the accuracy of the trained model. There are numerous approaches that have been put out for bulk data mining. A new generation of data mining methods called stream mining updates the model in a single pass whenever fresh data is received. Because of its inherent adaptability, this one-pass mechanism may be properly capable than its predecessors of coping with idea drift in data streams. In this study, we assess a group of algorithms for mining the data streams using decision trees. The collection of rules which can be derived from the induced model is the benefit of decision tree learning. The predicate logics that represent the extracted rules can then be applied in a variety of decision-supporting applications. Even in the face of concept drift, the induced decision tree has to be precise and condensed. When dealing with concept-drift data, we evaluate how well three common incremental decision tree methods (random forest, isolation forest, and forest tree) perform. In the experiment, drift data from both synthetic and actual environments are employed. It is discovered that optimization method with big data technique i.e., MapReduce yields better outcomes.","PeriodicalId":171432,"journal":{"name":"2022 International Conference on Knowledge Engineering and Communication Systems (ICKES)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Knowledge Engineering and Communication Systems (ICKES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKECS56523.2022.10060693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
It's still unclear how to effectively extract the information concealed inside vast and massive amounts of data. The problem of “idea drift” in stream data flows is one of the difficulties. Random walk is a common issue in data analytics where the statistical features of the characteristics and the categories they are intended for change with time, decreasing the accuracy of the trained model. There are numerous approaches that have been put out for bulk data mining. A new generation of data mining methods called stream mining updates the model in a single pass whenever fresh data is received. Because of its inherent adaptability, this one-pass mechanism may be properly capable than its predecessors of coping with idea drift in data streams. In this study, we assess a group of algorithms for mining the data streams using decision trees. The collection of rules which can be derived from the induced model is the benefit of decision tree learning. The predicate logics that represent the extracted rules can then be applied in a variety of decision-supporting applications. Even in the face of concept drift, the induced decision tree has to be precise and condensed. When dealing with concept-drift data, we evaluate how well three common incremental decision tree methods (random forest, isolation forest, and forest tree) perform. In the experiment, drift data from both synthetic and actual environments are employed. It is discovered that optimization method with big data technique i.e., MapReduce yields better outcomes.