Xiaochuan Lu , Huawei Kou , Cong Li , Runqing Zhan , Rongrong Guo , Shengnan Liu , Peixuan Shen , Meiyue Shen , Tingwei Du , Jiaqi Lu , Xiaoli Shen
{"title":"用于预测高尿酸血症风险的可解释机器学习模型的开发和验证:基于环境化学暴露","authors":"Xiaochuan Lu , Huawei Kou , Cong Li , Runqing Zhan , Rongrong Guo , Shengnan Liu , Peixuan Shen , Meiyue Shen , Tingwei Du , Jiaqi Lu , Xiaoli Shen","doi":"10.1016/j.ecoenv.2025.118392","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.</div></div>","PeriodicalId":303,"journal":{"name":"Ecotoxicology and Environmental Safety","volume":"299 ","pages":"Article 118392"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure\",\"authors\":\"Xiaochuan Lu , Huawei Kou , Cong Li , Runqing Zhan , Rongrong Guo , Shengnan Liu , Peixuan Shen , Meiyue Shen , Tingwei Du , Jiaqi Lu , Xiaoli Shen\",\"doi\":\"10.1016/j.ecoenv.2025.118392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.</div></div>\",\"PeriodicalId\":303,\"journal\":{\"name\":\"Ecotoxicology and Environmental Safety\",\"volume\":\"299 \",\"pages\":\"Article 118392\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecotoxicology and Environmental Safety\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0147651325007286\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecotoxicology and Environmental Safety","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0147651325007286","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Development and validation of an interpretable machine learning model for predicting hyperuricemia risk: Based on environmental chemical exposure
Hyperuricemia is a global health concern, with environmental chemicals as risk factors. This study used data of multiple environmental chemical exposures from the 2011–2012 cycle of the National Health and Nutrition Examination Survey (NHANES) to develop an interpretable machine learning model for hyperuricemia risk prediction. The least absolute shrinkage and selection operator (LASSO) regression method was employed to select relevant variables. The dataset was split into training (80 %) and test (20 %) sets and six machine learning models were constructed, including Random Forest (RF), Gaussian Naive Bayes (GNB), Light Gradient Boosting (LGB), Extreme Gradient Boosting (XGB), Adaptive Boosting Classifier (AB), and Support Vector Machine (SVM). Our study identified a hyperuricemia prevalence of 20.58 % in the 2011–2012 NHANES cycle, which was consistent with previous studies. The XGB model exhibited optimal performance, achieving the highest AUC (0.806, 95 % CI: 0.768–0.845), balanced accuracy (0.762; 95 % CI: 0.721–0.802), F1 value (0585; 95 % CI: 0.535–0.635), as well as the lowest Brier score (0.133; 95 % CI:0.122–0.144). Estimated glomerular filtration rate (eGFR), body mass index (BMI), cobalt (Co), mono-(2-ethyl)-hexyl phthalate (MEHP), mono-(3-carboxypropyl) phthalate (MCPP), mono-(2-ethyl-5-hydroxyhexyl) phthalate (MEHHP), 2-hydroxynaphthalene (OHNa2) were identified as the key factors contributing to the predictive model. The results of Shapley additive explanations and partial dependence plots indicated that hyperuricemia was positively associated with MCPP, MEHHP, and OHNa2, while negatively associated with Co and MEHP. This study is the first to predict the risk of hyperuricemia based on multiple environmental chemical exposures using a machine learning model.
期刊介绍:
Ecotoxicology and Environmental Safety is a multi-disciplinary journal that focuses on understanding the exposure and effects of environmental contamination on organisms including human health. The scope of the journal covers three main themes. The topics within these themes, indicated below, include (but are not limited to) the following: Ecotoxicology、Environmental Chemistry、Environmental Safety etc.