基于区域主动学习的分层自适应区域构建。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2019-05-01 DOI:10.1137/1.9781611975673.50

Zhipeng Luo, Milos Hauskrecht

{"title":"基于区域主动学习的分层自适应区域构建。","authors":"Zhipeng Luo, Milos Hauskrecht","doi":"10.1137/1.9781611975673.50","DOIUrl":null,"url":null,"abstract":"Learning of classification models in practice often relies on human annotation effort in which humans assign class labels to data instances. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To solve this problem, instead of soliciting instance-based annotation we explore region-based annotation as the human feedback. A region is defined as a hyper-cubic subspace of the input space X and it covers a subpopulation of data instances that fall into this region. Each region is labeled with a number in [0,1] (in binary classification setting), representing a human estimate of the positive (or negative) class proportion in the subpopulation. To quickly discover pure regions (in terms of class proportion) in the data, we have developed a novel active learning framework that constructs regions in a hierarchical and adaptive way. Hierarchical means that regions are incrementally built into a hierarchical tree, which is done by repeatedly splitting the input space. Adaptive means that our framework can adaptively choose the best heuristic for each of the region splits. Through experiments on numerous datasets we demonstrate that our framework can identify pure regions in very few region queries. Thus our approach is shown to be effective in learning classification models from very limited human feedback.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2019 ","pages":"441-449"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611975673.50","citationCount":"3","resultStr":"{\"title\":\"Region-Based Active Learning with Hierarchical and Adaptive Region Construction.\",\"authors\":\"Zhipeng Luo, Milos Hauskrecht\",\"doi\":\"10.1137/1.9781611975673.50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning of classification models in practice often relies on human annotation effort in which humans assign class labels to data instances. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To solve this problem, instead of soliciting instance-based annotation we explore region-based annotation as the human feedback. A region is defined as a hyper-cubic subspace of the input space X and it covers a subpopulation of data instances that fall into this region. Each region is labeled with a number in [0,1] (in binary classification setting), representing a human estimate of the positive (or negative) class proportion in the subpopulation. To quickly discover pure regions (in terms of class proportion) in the data, we have developed a novel active learning framework that constructs regions in a hierarchical and adaptive way. Hierarchical means that regions are incrementally built into a hierarchical tree, which is done by repeatedly splitting the input space. Adaptive means that our framework can adaptively choose the best heuristic for each of the region splits. Through experiments on numerous datasets we demonstrate that our framework can identify pure regions in very few region queries. Thus our approach is shown to be effective in learning classification models from very limited human feedback.\",\"PeriodicalId\":74533,\"journal\":{\"name\":\"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining\",\"volume\":\"2019 \",\"pages\":\"441-449\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1137/1.9781611975673.50\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/1.9781611975673.50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611975673.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在实践中，分类模型的学习通常依赖于人类的注释工作，其中人类将类标签分配给数据实例。由于这个过程非常耗时和昂贵，因此找到降低注释成本的有效方法对于构建这样的模型至关重要。为了解决这个问题，我们探索了基于区域的标注作为人类反馈，而不是请求基于实例的标注。区域被定义为输入空间X的超立方子空间，它覆盖了属于该区域的数据实例的子种群。每个区域用[0,1]中的数字标记(在二元分类设置中)，代表人类对子种群中正(或负)类比例的估计。为了快速发现数据中的纯区域(就类比例而言)，我们开发了一种新的主动学习框架，该框架以分层和自适应的方式构建区域。分层意味着将区域增量地构建到分层树中，这是通过重复分割输入空间来完成的。自适应意味着我们的框架可以自适应地为每个区域分割选择最佳启发式。通过对大量数据集的实验，我们证明了我们的框架可以在很少的区域查询中识别纯区域。因此，我们的方法在从非常有限的人类反馈中学习分类模型方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Region-Based Active Learning with Hierarchical and Adaptive Region Construction.

查看原文本刊更多论文

Region-Based Active Learning with Hierarchical and Adaptive Region Construction.

Learning of classification models in practice often relies on human annotation effort in which humans assign class labels to data instances. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To solve this problem, instead of soliciting instance-based annotation we explore region-based annotation as the human feedback. A region is defined as a hyper-cubic subspace of the input space X and it covers a subpopulation of data instances that fall into this region. Each region is labeled with a number in [0,1] (in binary classification setting), representing a human estimate of the positive (or negative) class proportion in the subpopulation. To quickly discover pure regions (in terms of class proportion) in the data, we have developed a novel active learning framework that constructs regions in a hierarchical and adaptive way. Hierarchical means that regions are incrementally built into a hierarchical tree, which is done by repeatedly splitting the input space. Adaptive means that our framework can adaptively choose the best heuristic for each of the region splits. Through experiments on numerous datasets we demonstrate that our framework can identify pure regions in very few region queries. Thus our approach is shown to be effective in learning classification models from very limited human feedback.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining

自引率

0.00%

发文量