Hierarchical exploration based active learning with support vector machine

Yanping Yang, E. Song, Guangzhi Ma
{"title":"Hierarchical exploration based active learning with support vector machine","authors":"Yanping Yang, E. Song, Guangzhi Ma","doi":"10.1109/CINC.2010.5643833","DOIUrl":null,"url":null,"abstract":"The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.","PeriodicalId":227004,"journal":{"name":"2010 Second International Conference on Computational Intelligence and Natural Computing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computational Intelligence and Natural Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINC.2010.5643833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The goal of active learning is to minimize the amount of labeled data required for machine learning. Some methods have focused on exploiting the samples with high uncertainty, but those methods fail in getting a representative set of the data samples. Other methods try to explore the representative samples by utilizing the prior distribution of the dataset. However, they are often computationally expensive and need a large amount of labeled data for initialization. In this paper we develop a hierarchical exploration based active learning algorithm that takes into account both the distribution of the dataset and the decision boundary of the current hypothesis. Our method uses the support vector machine (SVM) as the classifier. The hierarchical clustering algorithm is used to discover the dataset's structure step by step in a top-down manner. In each step of hierarchical structure discovery, the representative samples will be queried for labels to check the relative cluster's purity. The cluster with low purity will be divided further. After the draft SVM model is built with those representative samples, the uncertain samples near decision boundary will be further labeled if it can help reduce the entropy of the classifier. To show the effectiveness of the proposed method, our proposed method is compared with five state-of-art algorithms on six datasets from UCI. Our method shows the best performance through the comparison.
基于层次探索的支持向量机主动学习
主动学习的目标是最小化机器学习所需的标记数据量。一些方法侧重于开发具有高不确定性的样本,但这些方法无法获得具有代表性的数据样本集。其他方法试图通过利用数据集的先验分布来探索代表性样本。然而,它们通常在计算上很昂贵,并且需要大量的标记数据进行初始化。在本文中,我们开发了一种基于分层探索的主动学习算法,该算法同时考虑了数据集的分布和当前假设的决策边界。我们的方法使用支持向量机(SVM)作为分类器。采用分层聚类算法,自上而下逐步发现数据集的结构。在层次结构发现的每一步中,都会查询代表性样本的标签,以检查相对聚类的纯度。纯度低的团簇将进一步划分。在用这些代表性样本构建SVM模型草稿后,对决策边界附近的不确定样本进行进一步标记,以帮助降低分类器的熵。为了证明该方法的有效性,我们将该方法与来自UCI的六个数据集上的五种最新算法进行了比较。通过比较,我们的方法表现出最好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信