{"title":"将统计建模和机器学习技术与SHAP集成,用于流行病学数据分析","authors":"S. Qurat Ul Ain , Khalid Ul Islam Rather","doi":"10.1016/j.annepidem.2025.06.012","DOIUrl":null,"url":null,"abstract":"<div><div>Epidemiological studies increasingly rely on advanced analytics to uncover complex relationships in health data. This study employs an innovative SHAP (SHapley Additive exPlanations)-driven framework to enhance the interpretability of machine learning models applied to a dataset of 1200 patients. Key features, including demographic, anthropometric, lifestyle, and clinical parameters, were analyzed using a Random Forest classifier integrated with SHAP values. Health outcomes, specifically the presence of chronic diseases such as diabetes, were predicted with high accuracy (85 %) and AUC (0.89), outperforming logistic regression (accuracy = 79 %, AUC = 0.84). SHAP values further highlighted influential predictors such as BMI and age, offering insights into individual contributions to health outcomes. By bridging traditional epidemiological analysis and modern machine learning techniques, this study offers a transparent and interpretable model for healthcare decision-making.</div></div>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":"108 ","pages":"Pages 85-91"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated statistical modeling and machine learning techniques with SHAP for epidemiological data analysis\",\"authors\":\"S. Qurat Ul Ain , Khalid Ul Islam Rather\",\"doi\":\"10.1016/j.annepidem.2025.06.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Epidemiological studies increasingly rely on advanced analytics to uncover complex relationships in health data. This study employs an innovative SHAP (SHapley Additive exPlanations)-driven framework to enhance the interpretability of machine learning models applied to a dataset of 1200 patients. Key features, including demographic, anthropometric, lifestyle, and clinical parameters, were analyzed using a Random Forest classifier integrated with SHAP values. Health outcomes, specifically the presence of chronic diseases such as diabetes, were predicted with high accuracy (85 %) and AUC (0.89), outperforming logistic regression (accuracy = 79 %, AUC = 0.84). SHAP values further highlighted influential predictors such as BMI and age, offering insights into individual contributions to health outcomes. By bridging traditional epidemiological analysis and modern machine learning techniques, this study offers a transparent and interpretable model for healthcare decision-making.</div></div>\",\"PeriodicalId\":50767,\"journal\":{\"name\":\"Annals of Epidemiology\",\"volume\":\"108 \",\"pages\":\"Pages 85-91\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047279725001334\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047279725001334","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Integrated statistical modeling and machine learning techniques with SHAP for epidemiological data analysis
Epidemiological studies increasingly rely on advanced analytics to uncover complex relationships in health data. This study employs an innovative SHAP (SHapley Additive exPlanations)-driven framework to enhance the interpretability of machine learning models applied to a dataset of 1200 patients. Key features, including demographic, anthropometric, lifestyle, and clinical parameters, were analyzed using a Random Forest classifier integrated with SHAP values. Health outcomes, specifically the presence of chronic diseases such as diabetes, were predicted with high accuracy (85 %) and AUC (0.89), outperforming logistic regression (accuracy = 79 %, AUC = 0.84). SHAP values further highlighted influential predictors such as BMI and age, offering insights into individual contributions to health outcomes. By bridging traditional epidemiological analysis and modern machine learning techniques, this study offers a transparent and interpretable model for healthcare decision-making.
期刊介绍:
The journal emphasizes the application of epidemiologic methods to issues that affect the distribution and determinants of human illness in diverse contexts. Its primary focus is on chronic and acute conditions of diverse etiologies and of major importance to clinical medicine, public health, and health care delivery.