{"title":"Boosting semi-supervised learning under imbalanced regression via pseudo-labeling","authors":"Nannan Zong, Songzhi Su, Changle Zhou","doi":"10.1002/cpe.8103","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Imbalanced samples are widespread, which impairs the generalization and fairness of models. Semi-supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high-quality pseudo-label data. Unlike discrete labels that can be matched one-to-one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo-label data, exacerbating the imbalance in the semi-supervised dataset. To solve this problem, this article proposes a semi-supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo-labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo-label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few-shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 19","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8103","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Imbalanced samples are widespread, which impairs the generalization and fairness of models. Semi-supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high-quality pseudo-label data. Unlike discrete labels that can be matched one-to-one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo-label data, exacerbating the imbalance in the semi-supervised dataset. To solve this problem, this article proposes a semi-supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo-labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo-label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few-shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.