A Convex Optimization Framework for Active Learning

2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI:10.1109/ICCV.2013.33

Ehsan Elhamifar, G. Sapiro, A. Yang, S. Shankar Sasrty

{"title":"A Convex Optimization Framework for Active Learning","authors":"Ehsan Elhamifar, G. Sapiro, A. Yang, S. Shankar Sasrty","doi":"10.1109/ICCV.2013.33","DOIUrl":null,"url":null,"abstract":"In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classification performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the family of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classifier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"3 1","pages":"209-216"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"116","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 116

Abstract

In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classification performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the family of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classifier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.

查看原文本刊更多论文

主动学习的凸优化框架

在许多图像/视频/网络分类问题中，我们可以访问大量未标记的样本。然而，为样品获取标签通常是昂贵和耗时的。主动学习是逐步选择和标注信息量最大的未标记样本，以获得较高的分类性能的问题。大多数现有的主动学习算法在重新训练分类器之前一次只选择一个样本。因此，它们在计算上是昂贵的，并且不能利用类似Mechanical Turk这样的并行标记系统。另一方面，允许在重新训练分类器之前选择多个样本的算法可能会选择具有重要信息重叠的样本，或者它们涉及解决非凸优化。更重要的是，大多数主动学习算法都是针对某种分类器(如SVM)开发的。在本文中，我们开发了一个基于凸规划的高效主动学习框架，它可以一次选择多个样本进行标注。与目前的技术不同，我们的算法可以与任何类型的分类器结合使用，包括最近提出的基于稀疏表示的分类(SRC)系列。我们使用分类器不确定性和样本多样性两个原则来指导优化程序选择信息最丰富、信息重叠最少的未标记样本。我们的方法可以利用样本对之间的适当不相似性，将数据分布纳入选择过程。我们在真实世界的数据集上展示了我们的框架在人物检测、场景分类和人脸识别方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Conference on Computer Vision

自引率

0.00%

发文量