{"title":"Impact of Labeling Noise on Machine Learning: A Cost-aware Empirical Study","authors":"A. Gharawi, Jumana Alsubhi, Lakshmish Ramaswamy","doi":"10.1109/ICMLA55696.2022.00156","DOIUrl":null,"url":null,"abstract":"Since the emergence of large datasets, machine learning models have demonstrated excellent performance in a wide range of applications. This accomplishment was made possible by the availability of large amounts of labeled datasets. Finding high-quality labeled datasets, on the other hand, is difficult to obtain. Acquiring high-quality datasets with limited class label noise becomes an important task since noisy datasets can affect the performance and structure of machine learning models. However, it is extremely difficult to reduce label noise significantly in real-world datasets unless using expensive expert annotators. This work studies the influence of varying degrees of label noise on the complexity and accuracy of machine learning models, based on considerable testing and research. It also explores how to reduce labeling costs while maintaining the desired accuracy.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Since the emergence of large datasets, machine learning models have demonstrated excellent performance in a wide range of applications. This accomplishment was made possible by the availability of large amounts of labeled datasets. Finding high-quality labeled datasets, on the other hand, is difficult to obtain. Acquiring high-quality datasets with limited class label noise becomes an important task since noisy datasets can affect the performance and structure of machine learning models. However, it is extremely difficult to reduce label noise significantly in real-world datasets unless using expensive expert annotators. This work studies the influence of varying degrees of label noise on the complexity and accuracy of machine learning models, based on considerable testing and research. It also explores how to reduce labeling costs while maintaining the desired accuracy.