Informed Machine Learning: Excess risk and generalization

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-29 DOI:10.1016/j.neucom.2025.130521

Luca Oneto, Sandro Ridella, Davide Anguita

引用次数: 0

Abstract

Machine Learning (ML) has transformed both research and industry by offering powerful models capable of capturing complex phenomena. However, these models often require large, high-quality datasets and may struggle to generalize beyond the distributions on which they are trained. Informed Machine Learning (IML) tackles these challenges by incorporating domain knowledge at various stages of the ML pipeline, thereby reducing data requirements and enhancing generalization. Building on statistical learning theory, we present some theoretical comparison and insights about ML and IML excess risk and generalization performance. We then illustrate how these theoretical insights can be leveraged in practice through some practical examples. Our findings shed some light on the mechanisms and conditions under which IML can outperform traditional ML, offering valuable guidance for effective implementation in real-world settings.

查看原文本刊更多论文

知情机器学习：过度风险和泛化

机器学习（ML）通过提供能够捕捉复杂现象的强大模型，改变了研究和行业。然而，这些模型通常需要大的、高质量的数据集，并且可能难以泛化超出它们所训练的分布。知情机器学习（IML）通过在机器学习管道的各个阶段整合领域知识来解决这些挑战，从而减少数据需求并增强泛化。在统计学习理论的基础上，我们对ML和IML的过度风险和泛化性能进行了一些理论比较和见解。然后，我们通过一些实际例子说明如何在实践中利用这些理论见解。我们的研究结果揭示了IML优于传统ML的机制和条件，为在现实环境中有效实施提供了有价值的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.