用于预测高尿酸血症风险的可解释机器学习模型的开发和验证:基于环境化学暴露

IF 6.2 2区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Xiaochuan Lu , Huawei Kou , Cong Li , Runqing Zhan , Rongrong Guo , Shengnan Liu , Peixuan Shen , Meiyue Shen , Tingwei Du , Jiaqi Lu , Xiaoli Shen
{"title":"用于预测高尿酸血症风险的可解释机器学习模型的开发和验证:基于环境化学暴露","authors":"Xiaochuan Lu ,&nbsp;Huawei Kou ,&nbsp;Cong Li ,&nbsp;Runqing Zhan ,&nbsp;Rongrong Guo ,&nbsp;Shengnan Liu ,&nbsp;Peixuan Shen ,&nbsp;Meiyue Shen ,&nbsp;Tingwei Du ,&nbsp;Jiaqi Lu ,&nbsp;Xiaoli Shen","doi":"10.1016/j.ecoenv.2025.118392","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.</div></div>","PeriodicalId":303,"journal":{"name":"Ecotoxicology and Environmental Safety","volume":"299 ","pages":"Article 118392"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure\",\"authors\":\"Xiaochuan Lu ,&nbsp;Huawei Kou ,&nbsp;Cong Li ,&nbsp;Runqing Zhan ,&nbsp;Rongrong Guo ,&nbsp;Shengnan Liu ,&nbsp;Peixuan Shen ,&nbsp;Meiyue Shen ,&nbsp;Tingwei Du ,&nbsp;Jiaqi Lu ,&nbsp;Xiaoli Shen\",\"doi\":\"10.1016/j.ecoenv.2025.118392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.</div></div>\",\"PeriodicalId\":303,\"journal\":{\"name\":\"Ecotoxicology and Environmental Safety\",\"volume\":\"299 \",\"pages\":\"Article 118392\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecotoxicology and Environmental Safety\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0147651325007286\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecotoxicology and Environmental Safety","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0147651325007286","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

高尿酸血症是一个全球性的健康问题,环境化学品是危险因素。本研究利用2011-2012年国家健康与营养检查调查(NHANES)周期中多种环境化学物质暴露的数据,开发了一个可解释的高尿酸血症风险预测机器学习模型。采用最小绝对收缩和选择算子(LASSO)回归方法选择相关变量。将数据集分成训练集(80 %)和测试集(20 %),构建随机森林(RF)、高斯朴素贝叶斯(GNB)、光梯度增强(LGB)、极限梯度增强(XGB)、自适应增强分类器(AB)和支持向量机(SVM) 6个机器学习模型。我们的研究确定了2011-2012年NHANES周期中高尿酸血症的患病率为20.58 %,这与之前的研究一致。XGB模型表现最佳,AUC最高(0.806,95 % CI: 0.768-0.845),平衡精度最高(0.762;95 % CI: 0.721-0.802), F1值(0585;95 % CI: 0.535-0.635), Brier评分最低(0.133;95 %置信区间:0.122—-0.144)。估计的肾小球滤过率(eGFR)、体重指数(BMI)、钴(Co)、邻苯二甲酸单(2-乙基)-己基(MEHP)、邻苯二甲酸单(3-羧基丙基)(MCPP)、邻苯二甲酸单(2-乙基-5-羟基己基)(MEHHP)、2-羟基萘(OHNa2)被确定为预测模型的关键因素。Shapley加性解释和部分依赖图结果显示,高尿酸血症与MCPP、MEHHP、OHNa2呈正相关,与Co、MEHP呈负相关。这项研究首次使用机器学习模型来预测基于多种环境化学物质暴露的高尿酸血症风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure
Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
12.10
自引率
5.90%
发文量
1234
审稿时长
88 days
期刊介绍: Ecotoxicology and Environmental Safety is a multi-disciplinary journal that focuses on understanding the exposure and effects of environmental contamination on organisms including human health. The scope of the journal covers three main themes. The topics within these themes, indicated below, include (but are not limited to) the following: Ecotoxicology、Environmental Chemistry、Environmental Safety etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信