Zhuoran Xu, Cuiqin Hou, Yingju Xia, Jun Sun, Hiroya Inakoshi, N. Yugami
{"title":"处理概念漂移的金字塔堆栈数据流挖掘","authors":"Zhuoran Xu, Cuiqin Hou, Yingju Xia, Jun Sun, Hiroya Inakoshi, N. Yugami","doi":"10.1109/TENCON.2016.7847953","DOIUrl":null,"url":null,"abstract":"Data stream mining has gained growing attentions recently. Concept drift is a particular problem in data stream mining, which is defined as the distribution of data may change over time. Most of current methods try to estimate the current distribution or reconstruct the current distribution from a mixture of old distributions. They suffer problems of estimation and reconstruction error respectively. In this paper, we found that a classifier that fits the current distribution can be obtained more directly than the current methods by ensembling classifiers trained with increasing number of recent data. This strategy guarantees that no matter when and how concept drift happens, there is always a classifier that suits the current data distribution. So our method only needs to select the current distribution classifier out of all classifiers we hold. This is much easier than estimation and reconstruction. We test our method on four real world data sets. Comparing with other methods, our method is the best algorithm in terms of average accuracy.","PeriodicalId":246458,"journal":{"name":"2016 IEEE Region 10 Conference (TENCON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pyramid stack data stream mining for handling concept-drifting\",\"authors\":\"Zhuoran Xu, Cuiqin Hou, Yingju Xia, Jun Sun, Hiroya Inakoshi, N. Yugami\",\"doi\":\"10.1109/TENCON.2016.7847953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data stream mining has gained growing attentions recently. Concept drift is a particular problem in data stream mining, which is defined as the distribution of data may change over time. Most of current methods try to estimate the current distribution or reconstruct the current distribution from a mixture of old distributions. They suffer problems of estimation and reconstruction error respectively. In this paper, we found that a classifier that fits the current distribution can be obtained more directly than the current methods by ensembling classifiers trained with increasing number of recent data. This strategy guarantees that no matter when and how concept drift happens, there is always a classifier that suits the current data distribution. So our method only needs to select the current distribution classifier out of all classifiers we hold. This is much easier than estimation and reconstruction. We test our method on four real world data sets. Comparing with other methods, our method is the best algorithm in terms of average accuracy.\",\"PeriodicalId\":246458,\"journal\":{\"name\":\"2016 IEEE Region 10 Conference (TENCON)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Region 10 Conference (TENCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON.2016.7847953\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2016.7847953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pyramid stack data stream mining for handling concept-drifting
Data stream mining has gained growing attentions recently. Concept drift is a particular problem in data stream mining, which is defined as the distribution of data may change over time. Most of current methods try to estimate the current distribution or reconstruct the current distribution from a mixture of old distributions. They suffer problems of estimation and reconstruction error respectively. In this paper, we found that a classifier that fits the current distribution can be obtained more directly than the current methods by ensembling classifiers trained with increasing number of recent data. This strategy guarantees that no matter when and how concept drift happens, there is always a classifier that suits the current data distribution. So our method only needs to select the current distribution classifier out of all classifiers we hold. This is much easier than estimation and reconstruction. We test our method on four real world data sets. Comparing with other methods, our method is the best algorithm in terms of average accuracy.