Jongmin Shin , Jonghyeon Won , Hyun-Suk Lee , Jang-Won Lee
{"title":"A review on label cleaning techniques for learning with noisy labels","authors":"Jongmin Shin , Jonghyeon Won , Hyun-Suk Lee , Jang-Won Lee","doi":"10.1016/j.icte.2024.09.007","DOIUrl":null,"url":null,"abstract":"<div><div>Classification models categorize objects into given classes, guided by training samples with input features and labels. In practice, however, labels can be corrupted by human error or mistakes, known as label noise, which degrades classification accuracy. To address this issue, recently, various works propose the algorithms to clean datasets with label noise. We categorize the algorithms in granular ways, and review the algorithms, such as sample selection, label correction, and select-and-correct algorithms, based on the categorization. In addition, we provide future research directions for cleaning datasets, considering practical challenges, such as class imbalance, class incremental learning, and corrupted input features.</div></div>","PeriodicalId":48526,"journal":{"name":"ICT Express","volume":"10 6","pages":"Pages 1315-1330"},"PeriodicalIF":4.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICT Express","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405959524001103","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Classification models categorize objects into given classes, guided by training samples with input features and labels. In practice, however, labels can be corrupted by human error or mistakes, known as label noise, which degrades classification accuracy. To address this issue, recently, various works propose the algorithms to clean datasets with label noise. We categorize the algorithms in granular ways, and review the algorithms, such as sample selection, label correction, and select-and-correct algorithms, based on the categorization. In addition, we provide future research directions for cleaning datasets, considering practical challenges, such as class imbalance, class incremental learning, and corrupted input features.
期刊介绍:
The ICT Express journal published by the Korean Institute of Communications and Information Sciences (KICS) is an international, peer-reviewed research publication covering all aspects of information and communication technology. The journal aims to publish research that helps advance the theoretical and practical understanding of ICT convergence, platform technologies, communication networks, and device technologies. The technology advancement in information and communication technology (ICT) sector enables portable devices to be always connected while supporting high data rate, resulting in the recent popularity of smartphones that have a considerable impact in economic and social development.