利用成对置信度比较实现分类的稳健 AUC 最大化

IF 4.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers of Computer Science Pub Date : 2023-12-16 DOI:10.1007/s11704-023-2709-5

Haochen Shi, Mingkun Xie, Shengjun Huang

{"title":"利用成对置信度比较实现分类的稳健 AUC 最大化","authors":"Haochen Shi, Mingkun Xie, Shengjun Huang","doi":"10.1007/s11704-023-2709-5","DOIUrl":null,"url":null,"abstract":"<p>Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"6 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust AUC maximization for classification with pairwise confidence comparisons\",\"authors\":\"Haochen Shi, Mingkun Xie, Shengjun Huang\",\"doi\":\"10.1007/s11704-023-2709-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.</p>\",\"PeriodicalId\":12640,\"journal\":{\"name\":\"Frontiers of Computer Science\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2023-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11704-023-2709-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-2709-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

监督学习通常需要大量的标注示例，在人工标注类标签成本高昂的情况下，这已成为一个关键瓶颈。为了缓解这一问题，我们提出了一种称为成对比较（Pcomp）分类的新框架，允许只对训练示例进行弱注释的成对比较，即两个示例中哪一个更有可能是正面的。以往的研究通过最小化分类误差来解决 Pcomp 问题，但由于其对类别分布的敏感性，可能会导致模型的鲁棒性较差。在本文中，我们提出了一种针对 Pcomp 数据的稳健学习框架，以及一种名为 Pcomp-AUC 的成对替代损失。它提供了一种无偏估计器，可以在不获取精确类别标签的情况下等效地最大化 AUC。从理论上讲，我们证明了 AUC 的一致性，并进一步提供了所提方法的估计误差边界。对多个数据集的实证研究验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust AUC maximization for classification with pairwise confidence comparisons

Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers of Computer Science COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

8.60

自引率

2.40%

发文量

799

审稿时长

6-12 weeks

期刊介绍： Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.