Multi-Instance Learning with One Side Label Noise

IF 4.8 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-02-07 DOI:10.1145/3644076

Tianxiang Luan, Shilin Gu, Xijia Tang, Wenzhang Zhuge, Chenping Hou

{"title":"Multi-Instance Learning with One Side Label Noise","authors":"Tianxiang Luan, Shilin Gu, Xijia Tang, Wenzhang Zhuge, Chenping Hou","doi":"10.1145/3644076","DOIUrl":null,"url":null,"abstract":"<p>Multi-instance Learning (MIL) is a popular learning paradigm arising from many real applications. It assigns a label to a set of instances, named as a bag, and the bag’s label is determined by the instances within it. A bag is positive if and only if it has at least one positive instance. Since labeling bags is more complicated than labeling each instance, we will often face the mislabeling problem in MIL. Furthermore, it is more common that a negative bag has been mislabeled to a positive one since one mislabeled instance will lead to the change of the whole bag label. This is an important problem that originated from real applications, e.g., web mining and image classification, but little research has concentrated on it as far as we know. In this paper, we focus on this MIL problem with one side label noise that the negative bags are mislabeled as positive ones. To address this challenging problem, we propose a novel multi-instance learning method with One Side Label Noise (OSLN). We design a new double weighting approach under traditional framework to characterize the ’faithfulness’ of each instance and each bag in learning the classifier. Briefly, on the instance level, we employ a sparse weighting method to select the key instances, and the MIL problem with one size label noise is converted to a mislabeled supervised learning scenario. On the bag level, the weights of bags, together with the selected key instances, will be utilized to identify the real positive bags. In addition, we have solved our proposed model by an alternative iteration method with proved convergence behavior. Empirical studies on various datasets have validated the effectiveness of our method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"125 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3644076","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-instance Learning (MIL) is a popular learning paradigm arising from many real applications. It assigns a label to a set of instances, named as a bag, and the bag’s label is determined by the instances within it. A bag is positive if and only if it has at least one positive instance. Since labeling bags is more complicated than labeling each instance, we will often face the mislabeling problem in MIL. Furthermore, it is more common that a negative bag has been mislabeled to a positive one since one mislabeled instance will lead to the change of the whole bag label. This is an important problem that originated from real applications, e.g., web mining and image classification, but little research has concentrated on it as far as we know. In this paper, we focus on this MIL problem with one side label noise that the negative bags are mislabeled as positive ones. To address this challenging problem, we propose a novel multi-instance learning method with One Side Label Noise (OSLN). We design a new double weighting approach under traditional framework to characterize the ’faithfulness’ of each instance and each bag in learning the classifier. Briefly, on the instance level, we employ a sparse weighting method to select the key instances, and the MIL problem with one size label noise is converted to a mislabeled supervised learning scenario. On the bag level, the weights of bags, together with the selected key instances, will be utilized to identify the real positive bags. In addition, we have solved our proposed model by an alternative iteration method with proved convergence behavior. Empirical studies on various datasets have validated the effectiveness of our method.

查看原文本刊更多论文

单侧标签噪声下的多实例学习

多实例学习（Multi-instance Learning，MIL）是在许多实际应用中产生的一种流行的学习范式。它为一组实例分配一个标签，命名为 "包"，包的标签由其中的实例决定。当且仅当一个包至少有一个正向实例时，它才是正向的。由于给袋贴标签比给每个实例贴标签要复杂得多，因此我们在 MIL 中经常会遇到贴错标签的问题。此外，由于一个错误标注的实例会导致整个包的标签发生变化，因此负包被错误标注为正包的情况更为常见。这是一个源于实际应用（如网络挖掘和图像分类）的重要问题，但就我们所知，很少有研究集中于此。在本文中，我们将重点关注这一具有单侧标签噪声的 MIL 问题，即负袋被误标记为正袋。为了解决这个具有挑战性的问题，我们提出了一种带有单侧标签噪声（OSLN）的新型多实例学习方法。我们在传统框架下设计了一种新的双重加权方法，用于描述每个实例和每个袋在分类器学习中的 "忠实度"。简而言之，在实例层面，我们采用稀疏加权法来选择关键实例，并将单侧标签噪声的 MIL 问题转换为错误标签的监督学习场景。在袋层面，我们将利用袋的权重和所选的关键实例来识别真正的正向袋。此外，我们还采用了另一种迭代方法来解决我们提出的模型，其收敛性已得到证实。对各种数据集的实证研究验证了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.