{"title":"基于知情机器学习的欧洲成年人高血压环境风险评分","authors":"Jean-Baptiste Guimbaud , Emilie Calabre , Rafael de Cid , Camille Lassale , Manolis Kogevinas , Léa Maître , Rémy Cazabet","doi":"10.1016/j.artmed.2025.103139","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.</div></div><div><h3>Methods</h3><div>Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.</div></div><div><h3>Results</h3><div>The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"165 ","pages":"Article 103139"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An informed machine learning based environmental risk score for hypertension in European adults\",\"authors\":\"Jean-Baptiste Guimbaud , Emilie Calabre , Rafael de Cid , Camille Lassale , Manolis Kogevinas , Léa Maître , Rémy Cazabet\",\"doi\":\"10.1016/j.artmed.2025.103139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.</div></div><div><h3>Methods</h3><div>Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.</div></div><div><h3>Results</h3><div>The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.</div></div>\",\"PeriodicalId\":55458,\"journal\":{\"name\":\"Artificial Intelligence in Medicine\",\"volume\":\"165 \",\"pages\":\"Article 103139\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence in Medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0933365725000740\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365725000740","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An informed machine learning based environmental risk score for hypertension in European adults
Background
The exposome framework seeks to unravel the cumulated effects of environmental exposures on health. However, existing methods struggle with challenges including multicollinearity, non-linearity and confounding. To address these limitations, we introduce SEANN (Summary Effect Adjusted Neural Network) a novel approach that integrates pooled effect sizes—a form of domain knowledge—with neural networks to improve the analysis and interpretation of hypertension risk factors.
Methods
Based on data from 18,337 adults aged 40-65y participants in the GCAT cohort in Catalonia, covering a diverse selection of 53 environmental factors, we computed two environmental risk scores for hypertension prevalence using deep neural networks. An informed risk score using SEANN, integrating 11 different pooled effect size estimates from meta-analyses, and an agnostic counterpart for comparison. For each score, we computed Shapley values to extract and compare the learnt exposure-outcome relationships from each neural network model.
Results
The obtained predictive performances were similarly good for the agnostic NN and SEANN (AUC 0.7). However, we demonstrate substantial improvements in the scientific validity of the informed risk score captured relationships. Directly informed variables were closer to their corresponding relationships observed in literature and other non-informed variables were successfully adjusted with their direction of associations more in line with previous studies. The mean delta SHAP distance averaged over all variables of the relationships extracted with both models and those observed in the literature, was 6 times lower with SEANN compared with the agnostic NN. The most influential environmental variables within the informed risk score included smoking intensity, Mediterranean diet adherence, coffee consumption and sedentary behaviour.
Conclusions
This study demonstrates the added value of SEANN over conventional, purely data-driven machine learning approaches. By aligning learned relationships with established literature-based effect sizes, SEANN improves the disentanglement of exposure effects on hypertension.
期刊介绍:
Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care.
Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.