{"title":"NLSC: A noise-robust label shift correction framework via three-head training and class-adaptive cleaning","authors":"Xiaowen Wu, Ruidong Fan, Tingjin Luo, Chenping Hou","doi":"10.1016/j.ins.2025.122706","DOIUrl":null,"url":null,"abstract":"<div><div>Label shift occurs when the conditional distributions remain consistent between source and target domains, but the marginal label distributions differ. For instance, during the early stage of the COVID-19 outbreak, the proportion of pneumonia cases compared to common cold cases in hospitals may have been relatively low. This ratio could shift dramatically in later stages of the pandemic, with pneumonia cases becoming predominant, even though the symptomatic presentation of each disease remained consistent. Existing label shift methods typically aim to adapt a classifier’s output to match the target domain’s label distribution, assuming the source domain has clean labels. However, real-world scenarios often involve label noise in the source domain. For example, during COVID-19’s early phase, mild and confusable symptoms frequently led to misdiagnoses of COVID-19 as the common cold, introducing label noise. Such noise compromises the effectiveness of traditional methods, necessitating novel approaches. To address this, we analyze classifier error bounds under label shift correction using noisy source data. Based on this analysis, we propose a Noise-robust Label Shift Correction (NLSC) framework. NLSC employs a Three-Head Architecture Training (THAT) strategy for robust feature learning and a Class-Adaptive Threshold Cleaning (CATC) strategy for source data purification. Extensive experiments confirm that our method outperforms existing state-of-the-art techniques, particularly in real-world scenarios with high source domain noise rates.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"725 ","pages":"Article 122706"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008394","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Label shift occurs when the conditional distributions remain consistent between source and target domains, but the marginal label distributions differ. For instance, during the early stage of the COVID-19 outbreak, the proportion of pneumonia cases compared to common cold cases in hospitals may have been relatively low. This ratio could shift dramatically in later stages of the pandemic, with pneumonia cases becoming predominant, even though the symptomatic presentation of each disease remained consistent. Existing label shift methods typically aim to adapt a classifier’s output to match the target domain’s label distribution, assuming the source domain has clean labels. However, real-world scenarios often involve label noise in the source domain. For example, during COVID-19’s early phase, mild and confusable symptoms frequently led to misdiagnoses of COVID-19 as the common cold, introducing label noise. Such noise compromises the effectiveness of traditional methods, necessitating novel approaches. To address this, we analyze classifier error bounds under label shift correction using noisy source data. Based on this analysis, we propose a Noise-robust Label Shift Correction (NLSC) framework. NLSC employs a Three-Head Architecture Training (THAT) strategy for robust feature learning and a Class-Adaptive Threshold Cleaning (CATC) strategy for source data purification. Extensive experiments confirm that our method outperforms existing state-of-the-art techniques, particularly in real-world scenarios with high source domain noise rates.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.