Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen
{"title":"对分布偏倚数据的不平衡回归:一种快速静态方法","authors":"Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen","doi":"10.1016/j.infsof.2025.107897","DOIUrl":null,"url":null,"abstract":"<div><div>The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed <span>ViLoss</span>, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"188 ","pages":"Article 107897"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards imbalanced regression over distributionally biased data: A fast static approach\",\"authors\":\"Wentai Wu , Ligang He , Weiwei Lin , Jinyi Long , Zhiquan Liu , C.L. Philip Chen\",\"doi\":\"10.1016/j.infsof.2025.107897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed <span>ViLoss</span>, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"188 \",\"pages\":\"Article 107897\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002368\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002368","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Towards imbalanced regression over distributionally biased data: A fast static approach
The generalization of models are susceptible to data bias for both classification and regression problems, which also has intrinsic connection to the issues of fairness. However, existing approaches focus on class-imbalanced learning and fail to address these concerns for regressors. In this paper, we target at imbalanced regression with particular focus on fusing distributional information from both the feature space and target space. We first introduce two metrics, uniqueness and abnormality, to reflect local data distribution and assess the informativeness of each sample from a regional perspective in the two spaces. By integrating these two metrics we propose a local Variation-incented re-weighting method, termed ViLoss, which fuses distributional information for each sample to optimize gradient-based regressor training. The weights are computed once-and-for-all in pre-processing and thus our method causes little extra computation during training. Empirically, we conducted comprehensive experiments on both synthetic and real-world datasets for parameter study and performance evaluation. The results demonstrate the efficacy of our method in boosting model quality (error reduction by up to 39.0%) as well as narrowing the gap of error between groups.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.