Data streams classification with ensemble model based on decision-feedback

Q4 Computer Science
Jing LIU , Guo-sheng XU , Shi-hui ZHENG , Da XIAO , Li-ze GU
{"title":"Data streams classification with ensemble model based on decision-feedback","authors":"Jing LIU ,&nbsp;Guo-sheng XU ,&nbsp;Shi-hui ZHENG ,&nbsp;Da XIAO ,&nbsp;Li-ze GU","doi":"10.1016/S1005-8885(14)60272-7","DOIUrl":null,"url":null,"abstract":"<div><p>The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model <em>E</em> and subsequent labeled chunks are used to update <em>E</em>. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training <em>E</em> with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for <em>E</em>. An extended ensemble model <em>E<sub>x</sub></em> can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of <em>E<sub>x</sub></em>. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.</p></div>","PeriodicalId":35359,"journal":{"name":"Journal of China Universities of Posts and Telecommunications","volume":"21 1","pages":"Pages 79-85"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S1005-8885(14)60272-7","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of China Universities of Posts and Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1005888514602727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 12

Abstract

The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.

基于决策反馈的集成模型数据流分类
数据流分类的主要挑战包括无限长、概念漂移、新类的出现和缺乏标记实例。大多数现有技术只解决了其中的一些问题,而忽略了其他问题。为此,本文提出了一种基于决策反馈的集成分类模型(ECM-BDF)。首先,将数据流划分为连续的数据块,并从每个标记的数据块中训练分类模型。为了解决无限长度和概念漂移问题,将固定数量的此类模型组成一个集成模型E,并使用随后的标记块来更新E。为了解决新类的出现和有限的标记实例问题,该模型引入了一种新的类检测机制,无需使用该类的标记实例来训练E,即可检测到新类的到来。同时,从未标记的实例中训练无监督模型,为e提供有用的约束条件。以约束条件作为反馈信息,获得扩展集成模型Ex,满足Ex的最大一致性,可以更准确地分类未标记的实例。实验结果表明,所提出的ECM-BDF在有限标记数据流分类方面优于传统技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
1878
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信