对分布偏倚数据的不平衡回归：一种快速静态方法

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-09-22 DOI:10.1016/j.infsof.2025.107897

Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen

{"title":"对分布偏倚数据的不平衡回归：一种快速静态方法","authors":"Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen","doi":"10.1016/j.infsof.2025.107897","DOIUrl":null,"url":null,"abstract":"<div><div>The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed <span>ViLoss</span>, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"188 ","pages":"Article 107897"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards imbalanced regression over distributionally biased data: A fast static approach\",\"authors\":\"Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen\",\"doi\":\"10.1016/j.infsof.2025.107897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed <span>ViLoss</span>, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"188 \",\"pages\":\"Article 107897\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002368\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002368","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

对于分类和回归问题，模型的泛化容易受到数据偏差的影响，这也与公平性问题有着内在的联系。然而，现有的方法侧重于阶级不平衡的学习，未能解决这些回归的问题。在本文中，我们针对不平衡回归，特别关注融合来自特征空间和目标空间的分布信息。我们首先引入唯一性和异常两个指标来反映本地数据分布，并从区域角度评估两个空间中每个样本的信息量。通过整合这两个指标，我们提出了一种局部变量激励的重新加权方法，称为villoss，它融合了每个样本的分布信息，以优化基于梯度的回归器训练。在预处理过程中，权值是一次性计算的，因此我们的方法在训练过程中几乎不需要额外的计算。在经验上，我们对合成数据集和真实数据集进行了综合实验，以进行参数研究和性能评估。结果表明，我们的方法在提高模型质量（误差减少高达39.0%）和缩小组间误差差距方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards imbalanced regression over distributionally biased data: A fast static approach

The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed ViLoss, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.