基于概念类、概念漂移和缺失特征的非平衡数据增量学习综述

International Journal of Data Mining & Knowledge Management Process Pub Date : 2014-11-30 DOI:10.5121/IJDKP.2014.4602

P. Kulkarni, Roshani Ade

{"title":"基于概念类、概念漂移和缺失特征的非平衡数据增量学习综述","authors":"P. Kulkarni, Roshani Ade","doi":"10.5121/IJDKP.2014.4602","DOIUrl":null,"url":null,"abstract":"Recently, stream data mining applications has drawn vital attention from several research communities. Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine learning area has been developing learning algorithms that have certain assumptions on underlying distribution of data such as data should have predetermined distribution. Such constraints on the problem domain lead the way for development of smart learning algorithms performance is theoretically verifiable. Real-word situations are different than this restricted model. Applications usually suffers from problems such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also usual in real world applications, resulting in the “concept drift” which is related with data stream examples. These issues have been separately addressed by the researchers, also, it is observed that joint problem of class imbalance and concept drift has got relatively little research. If the final objective of clever machine learning techniques is to be able to address a broad spectrum of real world applications, then the necessity for a universal framework for learning from and tailoring (adapting) to, environment where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated. In this paper, we first present an overview of issues that are observed in stream data mining scenarios, followed by a complete review of recent research in dealing with each of the issue.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Incremental Learning From Unbalanced Data with Concept Class, Concept Drift and Missing Features : A Review\",\"authors\":\"P. Kulkarni, Roshani Ade\",\"doi\":\"10.5121/IJDKP.2014.4602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, stream data mining applications has drawn vital attention from several research communities. Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine learning area has been developing learning algorithms that have certain assumptions on underlying distribution of data such as data should have predetermined distribution. Such constraints on the problem domain lead the way for development of smart learning algorithms performance is theoretically verifiable. Real-word situations are different than this restricted model. Applications usually suffers from problems such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also usual in real world applications, resulting in the “concept drift” which is related with data stream examples. These issues have been separately addressed by the researchers, also, it is observed that joint problem of class imbalance and concept drift has got relatively little research. If the final objective of clever machine learning techniques is to be able to address a broad spectrum of real world applications, then the necessity for a universal framework for learning from and tailoring (adapting) to, environment where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated. In this paper, we first present an overview of issues that are observed in stream data mining scenarios, followed by a complete review of recent research in dealing with each of the issue.\",\"PeriodicalId\":131153,\"journal\":{\"name\":\"International Journal of Data Mining & Knowledge Management Process\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Mining & Knowledge Management Process\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/IJDKP.2014.4602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining & Knowledge Management Process","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/IJDKP.2014.4602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

最近，流数据挖掘应用引起了一些研究团体的极大关注。流数据是一种连续的数据形式，其特点是其在线特性。传统上，机器学习领域一直在开发学习算法，这些算法对数据的底层分布有一定的假设，比如数据应该具有预定分布。这种对问题域的约束为智能学习算法的发展指明了方向，其性能在理论上是可验证的。真实世界的情况不同于这种受限制的模型。应用程序通常会遇到数据分布不平衡等问题。此外，从非平稳环境中挑选的数据在现实世界的应用中也很常见，这导致了与数据流示例相关的“概念漂移”。这些问题已经被研究者们单独讨论过，而阶级失衡和概念漂移的联合问题也被研究得相对较少。如果智能机器学习技术的最终目标是能够解决广泛的现实世界应用，那么需要一个通用框架来学习和调整(适应)可能发生概念漂移和存在不平衡数据分布的环境，这一点很难夸大。在本文中，我们首先概述了在流数据挖掘场景中观察到的问题，然后对处理每个问题的最新研究进行了完整的回顾。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental Learning From Unbalanced Data with Concept Class, Concept Drift and Missing Features : A Review

Recently, stream data mining applications has drawn vital attention from several research communities. Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine learning area has been developing learning algorithms that have certain assumptions on underlying distribution of data such as data should have predetermined distribution. Such constraints on the problem domain lead the way for development of smart learning algorithms performance is theoretically verifiable. Real-word situations are different than this restricted model. Applications usually suffers from problems such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also usual in real world applications, resulting in the “concept drift” which is related with data stream examples. These issues have been separately addressed by the researchers, also, it is observed that joint problem of class imbalance and concept drift has got relatively little research. If the final objective of clever machine learning techniques is to be able to address a broad spectrum of real world applications, then the necessity for a universal framework for learning from and tailoring (adapting) to, environment where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated. In this paper, we first present an overview of issues that are observed in stream data mining scenarios, followed by a complete review of recent research in dealing with each of the issue.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Mining & Knowledge Management Process

自引率

0.00%

发文量