Boosting semi-supervised learning under imbalanced regression via pseudo-labeling

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Nannan Zong, Songzhi Su, Changle Zhou
{"title":"Boosting semi-supervised learning under imbalanced regression via pseudo-labeling","authors":"Nannan Zong,&nbsp;Songzhi Su,&nbsp;Changle Zhou","doi":"10.1002/cpe.8103","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Imbalanced samples are widespread, which impairs the generalization and fairness of models. Semi-supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high-quality pseudo-label data. Unlike discrete labels that can be matched one-to-one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo-label data, exacerbating the imbalance in the semi-supervised dataset. To solve this problem, this article proposes a semi-supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo-labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo-label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few-shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 19","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8103","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Imbalanced samples are widespread, which impairs the generalization and fairness of models. Semi-supervised learning can overcome the deficiency of rare labeled samples, but it is challenging to select high-quality pseudo-label data. Unlike discrete labels that can be matched one-to-one with points on a numerical axis, labels in regression tasks are consecutive and cannot be directly chosen. Besides, the distribution of unlabeled data is imbalanced, which easily leads to an imbalanced distribution of pseudo-label data, exacerbating the imbalance in the semi-supervised dataset. To solve this problem, this article proposes a semi-supervised imbalanced regression network (SIRN), which consists of two components: A, designed to learn the relationship between features and labels (targets), and B, dedicated to learning the relationship between features and target deviations. To measure target deviations under imbalanced distribution, the target deviation function is introduced. To select continuous pseudo-labels, the deviation matching strategy is designed. Furthermore, an adaptive selection function is developed to mitigate the risk of skewed distributions due to imbalanced pseudo-label data. Finally, the effectiveness of the proposed method is validated through evaluations of two regression tasks. The results show a great reduction in predicted value error, particularly in few-shot regions. This empirical evidence confirms the efficacy of our method in addressing the issue of imbalanced samples in regression tasks.

通过伪标记在不平衡回归条件下促进半监督学习
摘要不平衡样本很普遍,这会损害模型的泛化和公平性。半监督学习可以克服稀有标签样本的不足,但要选择高质量的伪标签数据却很有难度。离散标签可以与数字轴上的点一一对应,而回归任务中的标签是连续的,无法直接选择。此外,无标签数据的分布是不平衡的,这容易导致伪标签数据的分布不平衡,加剧半监督数据集的不平衡。为了解决这个问题,本文提出了一种半监督不平衡回归网络(SIRN),它由两个部分组成:A 部分旨在学习特征与标签(目标)之间的关系,B 部分专门用于学习特征与目标偏差之间的关系。为了测量不平衡分布下的目标偏差,引入了目标偏差函数。为了选择连续的伪标签,设计了偏差匹配策略。此外,还开发了一种自适应选择函数,以减轻不平衡伪标签数据导致的偏斜分布风险。最后,通过对两项回归任务的评估,验证了所提方法的有效性。结果表明,预测值误差大大降低,尤其是在少拍区域。这一经验证据证实了我们的方法在解决回归任务中不平衡样本问题方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信