Combining instance selection and self-training to improve data stream quantification

André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista
{"title":"Combining instance selection and self-training to improve data stream quantification","authors":"André G. Maletzke, Denis M. dos Reis, Gustavo E. A. P. A. Batista","doi":"10.1186/s13173-018-0076-0","DOIUrl":null,"url":null,"abstract":"In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.","PeriodicalId":39760,"journal":{"name":"Journal of the Brazilian Computer Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Brazilian Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13173-018-0076-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.
结合实例选择和自训练,提高数据流量化
近年来,数据流学习因其大量的应用而引起了研究人员和实践者的关注。这些应用促使研究界提出了大量的方法来解决不同任务中的问题,尤其是在分类、聚类和异常检测方面。然而,一项被称为量化的相关任务大部分仍未被探索。量化目标是在未标记的集合中提供类流行率的估计。最近,我们提出了SQSI算法来量化带有概念漂移的数据流。SQSI使用统计测试来识别概念漂移并重新训练分类器。然而,再培训涉及到要求所有新到达实例的标签。在本文中,我们通过探索与半监督学习相关的实例选择技术来扩展SQSI算法。其思想是请求最近示例的一个较小子集的类。我们的实验表明,SQSI的扩展显著降低了对实际标签的依赖,同时保持或提高了量化准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the Brazilian Computer Society
Journal of the Brazilian Computer Society Computer Science-Computer Science (all)
CiteScore
2.40
自引率
0.00%
发文量
2
期刊介绍: JBCS is a formal quarterly publication of the Brazilian Computer Society. It is a peer-reviewed international journal which aims to serve as a forum to disseminate innovative research in all fields of computer science and related subjects. Theoretical, practical and experimental papers reporting original research contributions are welcome, as well as high quality survey papers. The journal is open to contributions in all computer science topics, computer systems development or in formal and theoretical aspects of computing, as the list of topics below is not exhaustive. Contributions will be considered for publication in JBCS if they have not been published previously and are not under consideration for publication elsewhere.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信