TBC-MI:通过最大化清洗样本来抑制噪声标签,从而实现稳健的图像分类

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu
{"title":"TBC-MI:通过最大化清洗样本来抑制噪声标签,从而实现稳健的图像分类","authors":"Yanhong Li,&nbsp;Zhiqing Guo,&nbsp;Liejun Wang,&nbsp;Lianghui Xu","doi":"10.1016/j.ipm.2024.103801","DOIUrl":null,"url":null,"abstract":"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification\",\"authors\":\"Yanhong Li,&nbsp;Zhiqing Guo,&nbsp;Liejun Wang,&nbsp;Lianghui Xu\",\"doi\":\"10.1016/j.ipm.2024.103801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001602\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001602","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

在有噪声标签的分类任务中,消除数据集中噪声标签样本的干扰是提高网络性能的关键。然而,一些噪声样本和干净样本之间的分布是重叠的,因此如何区分它们是一个巨大的挑战。重叠区域内的干净标签样本往往包含极具代表性的特征信息,这对深度学习来说极具价值。我们提出了一种名为孪生二元分类混合输入(TBC-MI)的新方法来应对这一挑战。具体来说,TBC-MI 利用孪生分类网络对样本进行分割,将原本复杂的分类问题转换为二元分类。它利用一个简单的多层二进制分类网络,从硬标签区域过滤干净的标签样本。TBC-MI 在划分过程中使用了数据集的噪声,以更好地反映真实世界的场景。在最大化干净标签样本后,TBC-MI 采用在线和离线混合输入法来扩展样本的后续输入形式。我们在包含人工合成噪声的 CIFAR-10 和 CIFAR-100 数据集以及包含真实世界噪声的 Clothing1M ANIMAL-10N、CIFAR-10N 和 CHAOYANG 数据集上验证了所提出的方法。广泛的实验表明,我们的方法在大多数数据集上都达到了最佳的测试准确率,与之前使用噪声标签的学习方法相比,最好的改进幅度为 2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification

In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信