On active learning for data acquisition

Zhiqiang Zheng, B. Padmanabhan
{"title":"On active learning for data acquisition","authors":"Zhiqiang Zheng, B. Padmanabhan","doi":"10.1109/ICDM.2002.1184002","DOIUrl":null,"url":null,"abstract":"Many applications are characterized by having naturally incomplete data on customers - where data on only some fixed set of local variables is gathered However, having a more complete picture can help build better models. The naive solution to this problem - acquiring complete data for all customers s often impractical due to the costs of doing so. A possible alternative is to acquire complete data for \"some\" customers and to use this to improve the models built. The data acquisition problem is determining how many, and which, customers to acquire additional data from. In this paper we suggest using active learning based approaches for the data acquisition problem. In particular, we present initial methods for data acquisition and evaluate these methods experimentally on web usage data and UCI datasets. Results show that the methods perform well and indicate that active learning based methods for data acquisition can be a promising area for data mining research.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2002.1184002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 74

Abstract

Many applications are characterized by having naturally incomplete data on customers - where data on only some fixed set of local variables is gathered However, having a more complete picture can help build better models. The naive solution to this problem - acquiring complete data for all customers s often impractical due to the costs of doing so. A possible alternative is to acquire complete data for "some" customers and to use this to improve the models built. The data acquisition problem is determining how many, and which, customers to acquire additional data from. In this paper we suggest using active learning based approaches for the data acquisition problem. In particular, we present initial methods for data acquisition and evaluate these methods experimentally on web usage data and UCI datasets. Results show that the methods perform well and indicate that active learning based methods for data acquisition can be a promising area for data mining research.
关于数据获取的主动学习
许多应用程序的特点是具有自然不完整的客户数据——其中只收集了一些固定的局部变量集的数据。然而,拥有更完整的图像可以帮助构建更好的模型。对于这个问题,幼稚的解决方案——为所有客户获取完整的数据——由于这样做的成本,通常是不切实际的。一种可能的替代方法是获取“某些”客户的完整数据,并使用这些数据来改进所构建的模型。数据获取问题是确定从多少客户以及从哪些客户获取额外数据。在本文中,我们建议使用基于主动学习的方法来解决数据获取问题。特别地,我们提出了数据采集的初始方法,并在web使用数据和UCI数据集上对这些方法进行了实验评估。结果表明,基于主动学习的数据采集方法可以成为数据挖掘研究的一个有前途的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信