{"title":"TBC-MI:通过最大化清洗样本来抑制噪声标签,从而实现稳健的图像分类","authors":"Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu","doi":"10.1016/j.ipm.2024.103801","DOIUrl":null,"url":null,"abstract":"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification\",\"authors\":\"Yanhong Li, Zhiqing Guo, Liejun Wang, Lianghui Xu\",\"doi\":\"10.1016/j.ipm.2024.103801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001602\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001602","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
TBC-MI : Suppressing noise labels by maximizing cleaning samples for robust image classification
In classification tasks with noisy labels, eliminating the interference of noisy label samples in the dataset is the key to improving network performance. However, the distribution between some noise and clean samples is overlapping, so it is a great challenge to distinguish them. Clean label samples within the overlapping region often contain highly representative feature information, which is extremely valuable for deep learning. We propose a new method called twin binary classification-mixed input (TBC-MI) to tackle this challenge. Specifically, TBC-MI utilizes the twin classification network to partition the sample and converts the original complex classification problem into a binary classification. It filters clean label samples from hard label regions using a simple multilayer binary classification network. TBC-MI uses noise from the dataset in the dividing process to better reflect real-world scenarios. After maximizing the clean label samples, TBC-MI adopts a hybrid online and offline input method to expand the subsequent input form of the samples. The proposed method is verified on CIFAR-10 and CIFAR-100 datasets containing artificially synthesized noise and Clothing1M ANIMAL-10N, CIFAR-10N, and CHAOYANG datasets with real-world noise. Extensive experiments show that our method achieves the best test accuracy on most datasets, with the best improvement of 2% compared to previous learning methods with noisy labels.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.