Deconstructing deep imbalanced regression: a comprehensive review and experimental evaluation

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2026-04-22 Epub Date: 2026-04-29 DOI:10.1007/s10462-026-11570-1

Noah C. Puetz, Jens U. Brandt, Marc Hilbert, Elena Raponi, Thomas Bäck, Thomas Bartz-Beielstein

{"title":"Deconstructing deep imbalanced regression: a comprehensive review and experimental evaluation","authors":"Noah C. Puetz, Jens U. Brandt, Marc Hilbert, Elena Raponi, Thomas Bäck, Thomas Bartz-Beielstein","doi":"10.1007/s10462-026-11570-1","DOIUrl":null,"url":null,"abstract":"<div>In real-world applications, there is a fundamental problem: the data most critical to predict interesting events, anomalies, and high-stakes outliers are the rarest, while less interesting data is abundant. Although deep learning is deployed specifically for these difficult prediction tasks, data-driven models inevitably fail in underrepresented areas. This discrepancy between the empirical data- and the desired evaluation distribution is equivalent to a target distribution shift. The research field, termed Deep Imbalanced Regression (DIR), has emerged explicitly to address this challenge, which is particularly acute for continuous targets where most conventional classification-based methods are ill-suited. In this paper, we present the first comprehensive review of the DIR landscape, organized around a novel two-axis taxonomy that disentangles challenges along a Data Axis (target distribution shift, continuity, and density) and a Deep-Learning Axis (shared capacity, biased updates, and manifold distortion), where the latter captures a cascading failure mechanism through which deep models systematically neglect underrepresented targets. Within this framework, we systematically categorize and analyze 19 state-of-the-art methods spanning architectural, algorithm-level, and representation learning approaches, and empirically re-evaluate twelve of them with publicly available implementations under controlled, identical conditions. To stress-test generalization across the full target range, we introduce three novel targeted evaluation protocols, Balanced Extrapolation, Bimodal Interpolation, and Blind-Spot Isolation, that expose failure modes hidden by standard benchmarks (https://github.com/noah-puetz/deconstructing_deep_imbalanced_regression). Our study underscores the significant impact of imbalance on regression accuracy, offering a conceptual framework and practical benchmarks to catalyze further development of systems capable of capturing the rare as reliably as the common.</div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"59 6","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2026-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-026-11570-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-026-11570-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In real-world applications, there is a fundamental problem: the data most critical to predict interesting events, anomalies, and high-stakes outliers are the rarest, while less interesting data is abundant. Although deep learning is deployed specifically for these difficult prediction tasks, data-driven models inevitably fail in underrepresented areas. This discrepancy between the empirical data- and the desired evaluation distribution is equivalent to a target distribution shift. The research field, termed Deep Imbalanced Regression (DIR), has emerged explicitly to address this challenge, which is particularly acute for continuous targets where most conventional classification-based methods are ill-suited. In this paper, we present the first comprehensive review of the DIR landscape, organized around a novel two-axis taxonomy that disentangles challenges along a Data Axis (target distribution shift, continuity, and density) and a Deep-Learning Axis (shared capacity, biased updates, and manifold distortion), where the latter captures a cascading failure mechanism through which deep models systematically neglect underrepresented targets. Within this framework, we systematically categorize and analyze 19 state-of-the-art methods spanning architectural, algorithm-level, and representation learning approaches, and empirically re-evaluate twelve of them with publicly available implementations under controlled, identical conditions. To stress-test generalization across the full target range, we introduce three novel targeted evaluation protocols, Balanced Extrapolation, Bimodal Interpolation, and Blind-Spot Isolation, that expose failure modes hidden by standard benchmarks (https://github.com/noah-puetz/deconstructing_deep_imbalanced_regression). Our study underscores the significant impact of imbalance on regression accuracy, offering a conceptual framework and practical benchmarks to catalyze further development of systems capable of capturing the rare as reliably as the common.

查看原文本刊更多论文

解构深度不平衡回归：综合回顾与实验评价

在现实世界的应用程序中，存在一个基本问题：对于预测有趣的事件、异常和高风险异常值最关键的数据是最罕见的，而不那么有趣的数据则非常丰富。尽管深度学习是专门为这些困难的预测任务而部署的，但数据驱动模型不可避免地会在代表性不足的领域失败。经验数据与期望的评估分布之间的这种差异相当于目标分布的转移。研究领域，称为深度不平衡回归（DIR），已经明确出现，以解决这一挑战，这是特别尖锐的连续目标，大多数传统的基于分类的方法是不适合的。在本文中，我们提出了对DIR景观的第一次全面回顾，围绕一个新的双轴分类法进行组织，该分类法沿着数据轴（目标分布转移、连续性和密度）和深度学习轴（共享容量、有偏差更新和流形失真）解决挑战，后者捕获了级联失效机制，通过该机制，深度模型系统地忽略了代表性不足的目标。在这个框架内，我们系统地分类和分析了19种最先进的方法，包括架构、算法级和表示学习方法，并在受控的、相同的条件下，用公开可用的实现对其中的12种方法进行了经验性的重新评估。为了在整个目标范围内进行压力测试泛化，我们引入了三种新的有针对性的评估方案，即平衡外推法、双峰插值法和盲点隔离法，这些方案暴露了标准基准所隐藏的故障模式（https://github.com/noah-puetz/deconstructing_deep_imbalanced_regression）。我们的研究强调了不平衡对回归精度的重大影响，提供了一个概念框架和实践基准，以催化进一步开发能够捕获稀有和常见的系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.