Jing LIU , Guo-sheng XU , Shi-hui ZHENG , Da XIAO , Li-ze GU
{"title":"Data streams classification with ensemble model based on decision-feedback","authors":"Jing LIU , Guo-sheng XU , Shi-hui ZHENG , Da XIAO , Li-ze GU","doi":"10.1016/S1005-8885(14)60272-7","DOIUrl":null,"url":null,"abstract":"<div><p>The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model <em>E</em> and subsequent labeled chunks are used to update <em>E</em>. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training <em>E</em> with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for <em>E</em>. An extended ensemble model <em>E<sub>x</sub></em> can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of <em>E<sub>x</sub></em>. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.</p></div>","PeriodicalId":35359,"journal":{"name":"Journal of China Universities of Posts and Telecommunications","volume":"21 1","pages":"Pages 79-85"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S1005-8885(14)60272-7","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of China Universities of Posts and Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1005888514602727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 12
Abstract
The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback (ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.