Informed Machine Learning: Excess risk and generalization

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Luca Oneto, Sandro Ridella, Davide Anguita
{"title":"Informed Machine Learning: Excess risk and generalization","authors":"Luca Oneto,&nbsp;Sandro Ridella,&nbsp;Davide Anguita","doi":"10.1016/j.neucom.2025.130521","DOIUrl":null,"url":null,"abstract":"<div><div>Machine Learning (ML) has transformed both research and industry by offering powerful models capable of capturing complex phenomena. However, these models often require large, high-quality datasets and may struggle to generalize beyond the distributions on which they are trained. Informed Machine Learning (IML) tackles these challenges by incorporating domain knowledge at various stages of the ML pipeline, thereby reducing data requirements and enhancing generalization. Building on statistical learning theory, we present some theoretical comparison and insights about ML and IML excess risk and generalization performance. We then illustrate how these theoretical insights can be leveraged in practice through some practical examples. Our findings shed some light on the mechanisms and conditions under which IML can outperform traditional ML, offering valuable guidance for effective implementation in real-world settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"646 ","pages":"Article 130521"},"PeriodicalIF":6.5000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225011932","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Learning (ML) has transformed both research and industry by offering powerful models capable of capturing complex phenomena. However, these models often require large, high-quality datasets and may struggle to generalize beyond the distributions on which they are trained. Informed Machine Learning (IML) tackles these challenges by incorporating domain knowledge at various stages of the ML pipeline, thereby reducing data requirements and enhancing generalization. Building on statistical learning theory, we present some theoretical comparison and insights about ML and IML excess risk and generalization performance. We then illustrate how these theoretical insights can be leveraged in practice through some practical examples. Our findings shed some light on the mechanisms and conditions under which IML can outperform traditional ML, offering valuable guidance for effective implementation in real-world settings.
知情机器学习:过度风险和泛化
机器学习(ML)通过提供能够捕捉复杂现象的强大模型,改变了研究和行业。然而,这些模型通常需要大的、高质量的数据集,并且可能难以泛化超出它们所训练的分布。知情机器学习(IML)通过在机器学习管道的各个阶段整合领域知识来解决这些挑战,从而减少数据需求并增强泛化。在统计学习理论的基础上,我们对ML和IML的过度风险和泛化性能进行了一些理论比较和见解。然后,我们通过一些实际例子说明如何在实践中利用这些理论见解。我们的研究结果揭示了IML优于传统ML的机制和条件,为在现实环境中有效实施提供了有价值的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信