Efficient Active Learning Based on Uncertain Clusters

Juihsi Fu, Singling Lee, Wangping Wu
{"title":"Efficient Active Learning Based on Uncertain Clusters","authors":"Juihsi Fu, Singling Lee, Wangping Wu","doi":"10.1109/TAAI.2012.70","DOIUrl":null,"url":null,"abstract":"In active learning, raw samples are queried as few as possible to learn an accurate classifier. However, queried samples may encounter the problem of low diversity if they are selected without considering sample content. Then the classifier would be inefficiently resulted by the similar queried samples. In this paper, the approach, ALUC, is proposed to increase the diversity of queried uncertain samples. Raw samples are clustered based on the prior data distribution and sample uncertainty before they are queried. At first, the cluster seeds are found according to the underlying data distribution, without defining the number of clusters in advance. And the distance metric is designed to generate small clusters if they contain uncertain samples. Consequently representative samples of clusters are diverse in content and also informative to be queried. Through experimental results on a synthetic dataset and real-word datasets, it is shown that our distance metric for clustering is effective to find raw samples that are similar in content and uncertainty. And ALUC is able to query informative and diverse samples to result an accurate classifier.","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In active learning, raw samples are queried as few as possible to learn an accurate classifier. However, queried samples may encounter the problem of low diversity if they are selected without considering sample content. Then the classifier would be inefficiently resulted by the similar queried samples. In this paper, the approach, ALUC, is proposed to increase the diversity of queried uncertain samples. Raw samples are clustered based on the prior data distribution and sample uncertainty before they are queried. At first, the cluster seeds are found according to the underlying data distribution, without defining the number of clusters in advance. And the distance metric is designed to generate small clusters if they contain uncertain samples. Consequently representative samples of clusters are diverse in content and also informative to be queried. Through experimental results on a synthetic dataset and real-word datasets, it is shown that our distance metric for clustering is effective to find raw samples that are similar in content and uncertainty. And ALUC is able to query informative and diverse samples to result an accurate classifier.
基于不确定聚类的高效主动学习
在主动学习中,尽可能少地查询原始样本以学习准确的分类器。然而,如果不考虑样本的含量而选择所查询的样本,可能会遇到多样性低的问题。由于查询的样本相似,分类器的效率低下。本文提出了ALUC方法来增加查询不确定样本的多样性。在查询原始样本之前,基于先验数据分布和样本不确定性对原始样本进行聚类。首先,根据底层数据分布找到聚类种子,而不预先定义聚类的数量。距离度量被设计成在包含不确定样本的情况下生成小簇。因此,集群的代表性样本在内容上是多样的,并且也提供了信息。通过对合成数据集和实际数据集的实验结果表明,我们的聚类距离度量可以有效地找到内容和不确定性相似的原始样本。ALUC能够查询信息丰富且多样化的样本,从而得到准确的分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信