Controlling false positives in multiple instance learning: The “c-rule” approach

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Approximate Reasoning Pub Date : 2025-01-22 DOI:10.1016/j.ijar.2025.109367

Rosario Delgado

{"title":"Controlling false positives in multiple instance learning: The “c-rule” approach","authors":"Rosario Delgado","doi":"10.1016/j.ijar.2025.109367","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces a novel strategy for labeling bags in binary Multiple Instance Learning (MIL) under the <em>standard MI assumption</em>. The proposed approach addresses errors in instance labeling by classifying a bag as positive if it contains at least <em>c</em> positively labeled instances. This strategy seeks to balance the trade-off between controlling the <em>false positive rate</em> (mislabeling a negative bag as positive) and the <em>false negative rate</em> (mislabeling a positive bag as negative) while reducing labeling efforts.</div><div>The study provides theoretical justifications for this approach and introduces algorithms for its implementation, including determining the minimum value of <em>c</em> required to keep error rates below predefined thresholds. Additionally, it proposes a methodology to estimate the number of genuinely positive and negative instances within bags. Simulations demonstrate the superior performance of the “<em>c</em>-rule” compared to the <em>standard</em> rule (corresponding to <span><math><mi>c</mi><mo>=</mo><mn>1</mn></math></span>) in scenarios with sparse positive bags and moderately low to high probability of misclassifying a negative instance. This trend is further validated through comparisons using two real-world datasets. Overall, this research advances the understanding of error management in MIL and provides practical tools for real-world applications.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"179 ","pages":"Article 109367"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X25000088","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces a novel strategy for labeling bags in binary Multiple Instance Learning (MIL) under the standard MI assumption. The proposed approach addresses errors in instance labeling by classifying a bag as positive if it contains at least c positively labeled instances. This strategy seeks to balance the trade-off between controlling the false positive rate (mislabeling a negative bag as positive) and the false negative rate (mislabeling a positive bag as negative) while reducing labeling efforts.

The study provides theoretical justifications for this approach and introduces algorithms for its implementation, including determining the minimum value of c required to keep error rates below predefined thresholds. Additionally, it proposes a methodology to estimate the number of genuinely positive and negative instances within bags. Simulations demonstrate the superior performance of the “c-rule” compared to the standard rule (corresponding to

c = 1

) in scenarios with sparse positive bags and moderately low to high probability of misclassifying a negative instance. This trend is further validated through comparisons using two real-world datasets. Overall, this research advances the understanding of error management in MIL and provides practical tools for real-world applications.

查看原文本刊更多论文

控制多实例学习中的误报：“c规则”方法

本文介绍了在标准多实例学习假设下，二元多实例学习（MIL）中标签袋的一种新策略。提出的方法通过将至少包含c个正标记实例的袋子分类为正标记来解决实例标记中的错误。该策略寻求在控制假阳性率（将阴性袋误标记为阳性）和假阴性率（将阳性袋误标记为阴性）之间取得平衡，同时减少标签工作。该研究为该方法提供了理论依据，并介绍了实现该方法的算法，包括确定将错误率保持在预定义阈值以下所需的最小c值。此外，它提出了一种方法来估计袋子内真正积极和消极的情况的数量。模拟表明，在具有稀疏的正袋和中低到高的错误分类负实例的概率的场景中，与标准规则（对应于c=1）相比，“c规则”的性能更好。通过使用两个真实数据集的比较，进一步验证了这一趋势。总的来说，本研究促进了对MIL中错误管理的理解，并为实际应用提供了实用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.