基于知情机器学习的欧洲成年人高血压环境风险评分

IF 6.2 2区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jean-Baptiste Guimbaud , Emilie Calabre , Rafael de Cid , Camille Lassale , Manolis Kogevinas , Léa Maître , Rémy Cazabet
{"title":"基于知情机器学习的欧洲成年人高血压环境风险评分","authors":"Jean-Baptiste Guimbaud ,&nbsp;Emilie Calabre ,&nbsp;Rafael de Cid ,&nbsp;Camille Lassale ,&nbsp;Manolis Kogevinas ,&nbsp;Léa Maître ,&nbsp;Rémy Cazabet","doi":"10.1016/j.artmed.2025.103139","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.</div></div><div><h3>Methods</h3><div>Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.</div></div><div><h3>Results</h3><div>The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"165 ","pages":"Article 103139"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An informed machine learning based environmental risk score for hypertension in European adults\",\"authors\":\"Jean-Baptiste Guimbaud ,&nbsp;Emilie Calabre ,&nbsp;Rafael de Cid ,&nbsp;Camille Lassale ,&nbsp;Manolis Kogevinas ,&nbsp;Léa Maître ,&nbsp;Rémy Cazabet\",\"doi\":\"10.1016/j.artmed.2025.103139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.</div></div><div><h3>Methods</h3><div>Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.</div></div><div><h3>Results</h3><div>The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"165 \",\"pages\":\"Article 103139\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725000740\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725000740","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

暴露框架旨在揭示环境暴露对健康的累积影响。然而,现有的方法面临多重共线性、非线性和混杂等挑战。为了解决这些限制,我们引入了SEANN(总结效应调整神经网络),这是一种将汇集效应大小(一种领域知识形式)与神经网络相结合的新方法,以改进对高血压危险因素的分析和解释。方法基于来自加泰罗尼亚GCAT队列的18337名年龄在40-65岁之间的参与者的数据,涵盖了53个不同的环境因素,我们使用深度神经网络计算了高血压患病率的两个环境风险评分。使用SEANN的知情风险评分,整合了来自荟萃分析的11种不同的汇总效应大小估计,以及用于比较的不可知论对应物。对于每个分数,我们计算Shapley值,以从每个神经网络模型中提取和比较学习到的暴露-结果关系。结果所得的预测性能对于不可知论神经网络和SEANN相似(AUC为0.7)。然而,我们证明了在获取关系的知情风险评分的科学有效性方面有了实质性的改进。直接知情变量更接近文献中观察到的对应关系,其他非知情变量被成功调整,其关联方向更符合既往研究。与不可知论神经网络相比,SEANN提取的模型和文献中观察到的关系的所有变量的平均δ SHAP距离比不可知论神经网络低6倍。在知情风险评分中,最具影响力的环境变量包括吸烟强度、地中海饮食依从性、咖啡摄入量和久坐行为。这项研究证明了SEANN比传统的、纯数据驱动的机器学习方法更有价值。通过将学习到的关系与已建立的基于文献的效应大小对齐,SEANN改善了暴露对高血压影响的解开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An informed machine learning based environmental risk score for hypertension in European adults

Background

The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.

Methods

Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.

Results

The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.

Conclusions

This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine 工程技术-工程:生物医学
CiteScore
15.00
自引率
2.70%
发文量
143
审稿时长
6.3 months
期刊介绍: Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信