{"title":"可解释的机器学习预测妊娠期糖尿病的不良妊娠结局:回顾性队列研究。","authors":"Jiaxi Li, Xiali Liu, Shenyang He, Yan Ren","doi":"10.2196/71539","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Gestational diabetes mellitus (GDM) affects over 5% of pregnancies worldwide, elevating risks of type 2 diabetes post partum and complications such as fetal death, miscarriage, and congenital abnormalities. Effective GDM management is essential to balance glycemic control and pregnancy outcomes.</p><p><strong>Objective: </strong>We aim to develop interpretable machine learning models using GDM datasets for predicting adverse pregnancy outcomes and identifying key factors through the Shapley additive explanations (SHAP) algorithm, thus supporting improved maternal and infant health.</p><p><strong>Methods: </strong>Data preprocessing and feature selection were performed, with adaptive synthetic sampling used to address class imbalance. Classification models, including logistic regression, random forest, support vector machine, and extreme gradient boosting, were built and enhanced through the stacking method. Model interpretability was assessed with SHAP to quantify feature contributions.</p><p><strong>Results: </strong>Among 1670 patients, 200 experienced adverse outcomes. The stacking model outperformed individual models, achieving an accuracy of 85.6%, a sensitivity of 57.8%, a specificity of 95.9%, and an area under the receiver operating characteristic curve of 0.82 on the test set. External validation on 159 patients showed a decline in performance (accuracy 83.6%, area under the receiver operating characteristic curve 0.67). SHAP analysis identified gestational age, glucose control, and diagnosis time among the most influential predictors, providing clinically meaningful insights into risk factors. Additionally, detailed SHAP-based visualization revealed the distribution of different feature values and their nonlinear impact on outcomes, as well as interaction effects between features. These interpretable analyses enabled a deeper understanding of individual and combined feature contributions, thereby enhancing clinical assessment capabilities.</p><p><strong>Conclusions: </strong>This study underscores the potential of machine learning in predicting adverse outcomes in GDM, with interpretable features offering valuable clinical insights to enhance pregnancy management and maternal-infant health.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e71539"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12441465/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Adverse Pregnancy Outcomes in Gestational Diabetes: Retrospective Cohort Study.\",\"authors\":\"Jiaxi Li, Xiali Liu, Shenyang He, Yan Ren\",\"doi\":\"10.2196/71539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Gestational diabetes mellitus (GDM) affects over 5% of pregnancies worldwide, elevating risks of type 2 diabetes post partum and complications such as fetal death, miscarriage, and congenital abnormalities. Effective GDM management is essential to balance glycemic control and pregnancy outcomes.</p><p><strong>Objective: </strong>We aim to develop interpretable machine learning models using GDM datasets for predicting adverse pregnancy outcomes and identifying key factors through the Shapley additive explanations (SHAP) algorithm, thus supporting improved maternal and infant health.</p><p><strong>Methods: </strong>Data preprocessing and feature selection were performed, with adaptive synthetic sampling used to address class imbalance. Classification models, including logistic regression, random forest, support vector machine, and extreme gradient boosting, were built and enhanced through the stacking method. Model interpretability was assessed with SHAP to quantify feature contributions.</p><p><strong>Results: </strong>Among 1670 patients, 200 experienced adverse outcomes. The stacking model outperformed individual models, achieving an accuracy of 85.6%, a sensitivity of 57.8%, a specificity of 95.9%, and an area under the receiver operating characteristic curve of 0.82 on the test set. External validation on 159 patients showed a decline in performance (accuracy 83.6%, area under the receiver operating characteristic curve 0.67). SHAP analysis identified gestational age, glucose control, and diagnosis time among the most influential predictors, providing clinically meaningful insights into risk factors. Additionally, detailed SHAP-based visualization revealed the distribution of different feature values and their nonlinear impact on outcomes, as well as interaction effects between features. These interpretable analyses enabled a deeper understanding of individual and combined feature contributions, thereby enhancing clinical assessment capabilities.</p><p><strong>Conclusions: </strong>This study underscores the potential of machine learning in predicting adverse outcomes in GDM, with interpretable features offering valuable clinical insights to enhance pregnancy management and maternal-infant health.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e71539\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12441465/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/71539\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/71539","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Interpretable Machine Learning for Predicting Adverse Pregnancy Outcomes in Gestational Diabetes: Retrospective Cohort Study.
Background: Gestational diabetes mellitus (GDM) affects over 5% of pregnancies worldwide, elevating risks of type 2 diabetes post partum and complications such as fetal death, miscarriage, and congenital abnormalities. Effective GDM management is essential to balance glycemic control and pregnancy outcomes.
Objective: We aim to develop interpretable machine learning models using GDM datasets for predicting adverse pregnancy outcomes and identifying key factors through the Shapley additive explanations (SHAP) algorithm, thus supporting improved maternal and infant health.
Methods: Data preprocessing and feature selection were performed, with adaptive synthetic sampling used to address class imbalance. Classification models, including logistic regression, random forest, support vector machine, and extreme gradient boosting, were built and enhanced through the stacking method. Model interpretability was assessed with SHAP to quantify feature contributions.
Results: Among 1670 patients, 200 experienced adverse outcomes. The stacking model outperformed individual models, achieving an accuracy of 85.6%, a sensitivity of 57.8%, a specificity of 95.9%, and an area under the receiver operating characteristic curve of 0.82 on the test set. External validation on 159 patients showed a decline in performance (accuracy 83.6%, area under the receiver operating characteristic curve 0.67). SHAP analysis identified gestational age, glucose control, and diagnosis time among the most influential predictors, providing clinically meaningful insights into risk factors. Additionally, detailed SHAP-based visualization revealed the distribution of different feature values and their nonlinear impact on outcomes, as well as interaction effects between features. These interpretable analyses enabled a deeper understanding of individual and combined feature contributions, thereby enhancing clinical assessment capabilities.
Conclusions: This study underscores the potential of machine learning in predicting adverse outcomes in GDM, with interpretable features offering valuable clinical insights to enhance pregnancy management and maternal-infant health.
期刊介绍:
JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals.
Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.