类不平衡数据流的对抗性核采样

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI:10.1145/3459637.3482227

Peng Yang, Ping Li

{"title":"类不平衡数据流的对抗性核采样","authors":"Peng Yang, Ping Li","doi":"10.1145/3459637.3482227","DOIUrl":null,"url":null,"abstract":"This paper investigates online active learning in the setting of class-imbalanced data streams, where labels are allowed to be queried of with limited budgets. In this setup, conventional learning would be biased towards majority classes and consequently harm the performance. To address this issue, imbalance learning technique adopts both asymmetric losses and asymmetric queries to tackle the imbalance. Although this approach is effective, it may not guarantee the performance in an adversarial setting where the actual labels are unknown, and they may be chosen by the adversary To learn a promising hypothesis in class-imbalanced and adversarial environment, we propose an asymmetric min-max optimization framework for online classification. The derived algorithm can track the imbalance and bound the choices of an adversary simultaneously. Despite the promising result, this algorithm assumes that the label is provided for every input, while label is scare and labeling is expensive in real-world application. To this end, we design a confidence-based sampling strategy to query the informative labels within a budget. We theoretically analyze this algorithm in terms of mistake bound, and two asymmetric measures. Empirically, we evaluate the algorithms on multiple real-world imbalanced tasks. Promising results could be achieved on various application domains.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Adversarial Kernel Sampling on Class-imbalanced Data Streams\",\"authors\":\"Peng Yang, Ping Li\",\"doi\":\"10.1145/3459637.3482227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates online active learning in the setting of class-imbalanced data streams, where labels are allowed to be queried of with limited budgets. In this setup, conventional learning would be biased towards majority classes and consequently harm the performance. To address this issue, imbalance learning technique adopts both asymmetric losses and asymmetric queries to tackle the imbalance. Although this approach is effective, it may not guarantee the performance in an adversarial setting where the actual labels are unknown, and they may be chosen by the adversary To learn a promising hypothesis in class-imbalanced and adversarial environment, we propose an asymmetric min-max optimization framework for online classification. The derived algorithm can track the imbalance and bound the choices of an adversary simultaneously. Despite the promising result, this algorithm assumes that the label is provided for every input, while label is scare and labeling is expensive in real-world application. To this end, we design a confidence-based sampling strategy to query the informative labels within a budget. We theoretically analyze this algorithm in terms of mistake bound, and two asymmetric measures. Empirically, we evaluate the algorithms on multiple real-world imbalanced tasks. Promising results could be achieved on various application domains.\",\"PeriodicalId\":405296,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459637.3482227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文研究了类不平衡数据流环境下的在线主动学习，在这种情况下，标签可以在有限的预算下查询。在这种设置中，传统的学习将偏向于大多数班级，从而损害性能。为了解决这个问题，不平衡学习技术采用非对称损失和非对称查询来解决不平衡问题。虽然这种方法是有效的，但它可能不能保证在实际标签未知的对抗环境下的性能，并且它们可能被对手选择。为了在类不平衡和对抗环境下学习一个有希望的假设，我们提出了一个非对称的最小-最大优化框架用于在线分类。该算法可以跟踪不平衡并同时约束对手的选择。尽管结果很有希望，但该算法假设为每个输入都提供了标签，而标签在实际应用中是可怕的，并且标签是昂贵的。为此，我们设计了一种基于置信度的采样策略来查询预算内的信息标签。我们从错误界和两个非对称测度的角度对该算法进行了理论分析。经验上，我们在多个现实世界的不平衡任务上评估算法。在各个应用领域都能取得可喜的成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adversarial Kernel Sampling on Class-imbalanced Data Streams

This paper investigates online active learning in the setting of class-imbalanced data streams, where labels are allowed to be queried of with limited budgets. In this setup, conventional learning would be biased towards majority classes and consequently harm the performance. To address this issue, imbalance learning technique adopts both asymmetric losses and asymmetric queries to tackle the imbalance. Although this approach is effective, it may not guarantee the performance in an adversarial setting where the actual labels are unknown, and they may be chosen by the adversary To learn a promising hypothesis in class-imbalanced and adversarial environment, we propose an asymmetric min-max optimization framework for online classification. The derived algorithm can track the imbalance and bound the choices of an adversary simultaneously. Despite the promising result, this algorithm assumes that the label is provided for every input, while label is scare and labeling is expensive in real-world application. To this end, we design a confidence-based sampling strategy to query the informative labels within a budget. We theoretically analyze this algorithm in terms of mistake bound, and two asymmetric measures. Empirically, we evaluate the algorithms on multiple real-world imbalanced tasks. Promising results could be achieved on various application domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量