Drift detection in data stream classification without fully labelled instances

E. Lughofer, Eva Weigl, Wolfgang Heidl, C. Eitzinger, Thomas Radauer
{"title":"Drift detection in data stream classification without fully labelled instances","authors":"E. Lughofer, Eva Weigl, Wolfgang Heidl, C. Eitzinger, Thomas Radauer","doi":"10.1109/EAIS.2015.7368802","DOIUrl":null,"url":null,"abstract":"Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in the system. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available, which is often a quite unrealistic scenario in on-line real-world applications. We propose two ways to improve economy and applicability of drift detection: 1.) a semi-supervised approach employing single-pass active learning filters for selecting the most interesting samples for supervising the performance of classifiers and 2.) a fully unsupervised approach based on the overlap degree of classifier's output certainty distributions. Both variants rely on a modified version of the Page-Hinkley test, where a fading factor is introduced to outweigh older samples, making it more flexible to detect successive drift occurrences in a stream. The approaches are compared with the fully supervised variant (SoA) on two real-world on-line applications: the semi-supervised approach is able to detect three real-occurring drifts in these streams with an even lower than resp. the same delay as the supervised variant of about 200 (versus 300) resp. 70 samples, and this by requiring only 20% labelled samples.","PeriodicalId":325875,"journal":{"name":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EAIS.2015.7368802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in the system. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available, which is often a quite unrealistic scenario in on-line real-world applications. We propose two ways to improve economy and applicability of drift detection: 1.) a semi-supervised approach employing single-pass active learning filters for selecting the most interesting samples for supervising the performance of classifiers and 2.) a fully unsupervised approach based on the overlap degree of classifier's output certainty distributions. Both variants rely on a modified version of the Page-Hinkley test, where a fading factor is introduced to outweigh older samples, making it more flexible to detect successive drift occurrences in a stream. The approaches are compared with the fully supervised variant (SoA) on two real-world on-line applications: the semi-supervised approach is able to detect three real-occurring drifts in these streams with an even lower than resp. the same delay as the supervised variant of about 200 (versus 300) resp. 70 samples, and this by requiring only 20% labelled samples.
无完全标记实例的数据流分类中的漂移检测
漂移检测是基于分类的流挖掘中的一个重要问题,它能够在系统发生意外变化时通知操作人员。通常,当前的检测方法依赖于假设有完全监督的标记流可用,这在在线实际应用中通常是相当不现实的场景。我们提出了两种方法来提高漂移检测的经济性和适用性:1.采用单次主动学习滤波器的半监督方法来选择最感兴趣的样本来监督分类器的性能;2.基于分类器输出确定性分布的重叠程度的完全无监督方法。这两种变体都依赖于Page-Hinkley测试的改进版本,其中引入了一个衰落因子来抵消旧样本,使其更灵活地检测流中连续漂移的发生。在两个现实世界的在线应用中,将这些方法与完全监督变体(SoA)进行了比较:半监督方法能够以甚至低于resp的速度检测到这些流中三个实际发生的漂移。与监督变体相同的延迟约为200(相对于300)/秒。70个样品,而这只需要20%的标记样品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信