Hierarchical Active Learning with Overlapping Regions.

Zhipeng Luo, Milos Hauskrecht
{"title":"Hierarchical Active Learning with Overlapping Regions.","authors":"Zhipeng Luo,&nbsp;Milos Hauskrecht","doi":"10.1145/3340531.3412022","DOIUrl":null,"url":null,"abstract":"<p><p>Learning of classification models from real-world data often requires substantial human effort devoted to <i>instance</i> annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - <i>region</i>-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a <i>subpopulation</i> of data instances; the region's label is a human assessment of the class <i>proportion</i> of the data subpopulation. By using <i>learning from label proportions</i> algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a <i>hierarchical active learning</i> solution that aims at incrementally building a <i>concise</i> hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2020 ","pages":"1045-1054"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3340531.3412022","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3340531.3412022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Learning of classification models from real-world data often requires substantial human effort devoted to instance annotation. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To address this problem we explore a new type of human feedback - region-based feedback. Briefly, a region is defined as a hypercubic subspace of the input data space and represents a subpopulation of data instances; the region's label is a human assessment of the class proportion of the data subpopulation. By using learning from label proportions algorithms one can learn instance-based classifiers from such labeled regions. In general, the key challenge is that there can be infinite many regions one can define and query in a given data space. To minimize the number and complexity of region-based queries, we propose and develop a hierarchical active learning solution that aims at incrementally building a concise hierarchy of regions. Furthermore, to avoid building a possibly class-irrelevant region hierarchy, we further propose to grow multiple different hierarchies in parallel and expand those more informative hierarchies. Through experiments on numerous data sets, we demonstrate that methods using region-based feedback can learn very good classifiers from very few and simple queries, and hence are highly effective in reducing human annotation effort needed for building classification models.

具有重叠区域的分层主动学习。
从真实世界的数据中学习分类模型通常需要大量的人力投入到实例注释中。由于这个过程非常耗时和昂贵,因此找到降低注释成本的有效方法对于构建这样的模型至关重要。为了解决这个问题,我们探索了一种新型的人类反馈——基于区域的反馈。简而言之,区域被定义为输入数据空间的超立方子空间,表示数据实例的子种群;区域的标签是对数据亚群的类比例的人类评估。通过使用从标签比例算法中学习,可以从这些标记区域中学习基于实例的分类器。一般来说,关键的挑战是在给定的数据空间中可以定义和查询无限多个区域。为了最小化基于区域的查询的数量和复杂性,我们提出并开发了一种分层主动学习解决方案,旨在逐步构建简洁的区域层次结构。此外,为了避免建立可能与类无关的区域层次结构,我们进一步提出并行增长多个不同的层次结构,并扩展这些信息更多的层次结构。通过对大量数据集的实验,我们证明了使用基于区域的反馈的方法可以从非常少和简单的查询中学习到非常好的分类器,因此在减少构建分类模型所需的人工注释方面非常有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信