Noah C. Puetz, Jens U. Brandt, Marc Hilbert, Elena Raponi, Thomas Bäck, Thomas Bartz-Beielstein
{"title":"Deconstructing deep imbalanced regression: a comprehensive review and experimental evaluation","authors":"Noah C. Puetz, Jens U. Brandt, Marc Hilbert, Elena Raponi, Thomas Bäck, Thomas Bartz-Beielstein","doi":"10.1007/s10462-026-11570-1","DOIUrl":null,"url":null,"abstract":"<div><p>In real-world applications, there is a fundamental problem: the data most critical to predict interesting events, anomalies, and high-stakes outliers are the rarest, while less interesting data is abundant. Although deep learning is deployed specifically for these difficult prediction tasks, data-driven models inevitably fail in underrepresented areas. This discrepancy between the empirical data- and the desired evaluation distribution is equivalent to a target distribution shift. The research field, termed Deep Imbalanced Regression (DIR), has emerged explicitly to address this challenge, which is particularly acute for continuous targets where most conventional classification-based methods are ill-suited. In this paper, we present the first comprehensive review of the DIR landscape, organized around a novel two-axis taxonomy that disentangles challenges along a <i>Data Axis</i> (target distribution shift, continuity, and density) and a <i>Deep-Learning Axis</i> (shared capacity, biased updates, and manifold distortion), where the latter captures a cascading failure mechanism through which deep models systematically neglect underrepresented targets. Within this framework, we systematically categorize and analyze 19 state-of-the-art methods spanning architectural, algorithm-level, and representation learning approaches, and empirically re-evaluate twelve of them with publicly available implementations under controlled, identical conditions. To stress-test generalization across the full target range, we introduce three novel targeted evaluation protocols, <i>Balanced Extrapolation</i>, <i>Bimodal Interpolation</i>, and <i>Blind-Spot Isolation</i>, that expose failure modes hidden by standard benchmarks (https://github.com/noah-puetz/deconstructing_deep_imbalanced_regression). Our study underscores the significant impact of imbalance on regression accuracy, offering a conceptual framework and practical benchmarks to catalyze further development of systems capable of capturing the rare as reliably as the common.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"59 6","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2026-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-026-11570-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-026-11570-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In real-world applications, there is a fundamental problem: the data most critical to predict interesting events, anomalies, and high-stakes outliers are the rarest, while less interesting data is abundant. Although deep learning is deployed specifically for these difficult prediction tasks, data-driven models inevitably fail in underrepresented areas. This discrepancy between the empirical data- and the desired evaluation distribution is equivalent to a target distribution shift. The research field, termed Deep Imbalanced Regression (DIR), has emerged explicitly to address this challenge, which is particularly acute for continuous targets where most conventional classification-based methods are ill-suited. In this paper, we present the first comprehensive review of the DIR landscape, organized around a novel two-axis taxonomy that disentangles challenges along a Data Axis (target distribution shift, continuity, and density) and a Deep-Learning Axis (shared capacity, biased updates, and manifold distortion), where the latter captures a cascading failure mechanism through which deep models systematically neglect underrepresented targets. Within this framework, we systematically categorize and analyze 19 state-of-the-art methods spanning architectural, algorithm-level, and representation learning approaches, and empirically re-evaluate twelve of them with publicly available implementations under controlled, identical conditions. To stress-test generalization across the full target range, we introduce three novel targeted evaluation protocols, Balanced Extrapolation, Bimodal Interpolation, and Blind-Spot Isolation, that expose failure modes hidden by standard benchmarks (https://github.com/noah-puetz/deconstructing_deep_imbalanced_regression). Our study underscores the significant impact of imbalance on regression accuracy, offering a conceptual framework and practical benchmarks to catalyze further development of systems capable of capturing the rare as reliably as the common.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.