Sangsang Qi, Shi Zheng, Mengdan Lu, Aner Chen, Yanbo Chen, Xianhu Fu
{"title":"Building a machine learning-based risk prediction model for second-trimester miscarriage.","authors":"Sangsang Qi, Shi Zheng, Mengdan Lu, Aner Chen, Yanbo Chen, Xianhu Fu","doi":"10.1186/s12884-024-06942-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Second-trimester miscarriage is a common adverse pregnancy outcome that imposes substantial economic and psychological pressures on both the physical and mental well-being of patients and their families. Currently, there is a scarcity of research on predictive models for the risk of second-trimester miscarriage.</p><p><strong>Methods: </strong>Clinical data were retrospectively collected from patients who were in the second trimester of pregnancy (between 14+0 and 27+6 weeks gestation), whose main diagnosis was \"threatened abortion\" and who were hospitalized at the Women and Children's Hospital of Ningbo University from January 2020 to October 2023. Following preliminary data processing, the patient cohort was randomly stratified into a training cohort and a validation cohort at proportions of 70% and 30%, respectively. The Boruta algorithm and multifactor analysis were used to refine feature factors and determine the optimal features linked to second-trimester miscarriages. The imbalanced dataset from the training cohort was rectified by applying the SMOTE oversampling approach. Seven machine-learning models were built and subjected to a comprehensive analysis to validate and evaluate their predictive capabilities. Through this rigorous assessment, the optimal model was selected. Shapley additive explanations (SHAP) were generated to provide insights into the model's predictions, and a visual representation of the predictive model was built.</p><p><strong>Results: </strong>A total of 2006 patients were included in the study; 395 (19.69%) of them had second-trimester miscarriages. XGBoost was shown to be the optimal model after a comparison of seven different models utilizing metrics such as accuracy, precision, recall, the F1 score, precision-recall average precision, the receiver operating characteristic-area under the curve, decision curve analysis, and the calibration curve. The most significant feature was cervical length, and the top ten features of second-trimester miscarriage were found using the SHAP technique based on relevance rankings.</p><p><strong>Conclusion: </strong>The risk of a second-trimester miscarriage can be accurately predicted by the visual risk prediction model, which is based on the machine learning mentioned above.</p>","PeriodicalId":9033,"journal":{"name":"BMC Pregnancy and Childbirth","volume":"24 1","pages":"738"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550545/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pregnancy and Childbirth","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12884-024-06942-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Second-trimester miscarriage is a common adverse pregnancy outcome that imposes substantial economic and psychological pressures on both the physical and mental well-being of patients and their families. Currently, there is a scarcity of research on predictive models for the risk of second-trimester miscarriage.
Methods: Clinical data were retrospectively collected from patients who were in the second trimester of pregnancy (between 14+0 and 27+6 weeks gestation), whose main diagnosis was "threatened abortion" and who were hospitalized at the Women and Children's Hospital of Ningbo University from January 2020 to October 2023. Following preliminary data processing, the patient cohort was randomly stratified into a training cohort and a validation cohort at proportions of 70% and 30%, respectively. The Boruta algorithm and multifactor analysis were used to refine feature factors and determine the optimal features linked to second-trimester miscarriages. The imbalanced dataset from the training cohort was rectified by applying the SMOTE oversampling approach. Seven machine-learning models were built and subjected to a comprehensive analysis to validate and evaluate their predictive capabilities. Through this rigorous assessment, the optimal model was selected. Shapley additive explanations (SHAP) were generated to provide insights into the model's predictions, and a visual representation of the predictive model was built.
Results: A total of 2006 patients were included in the study; 395 (19.69%) of them had second-trimester miscarriages. XGBoost was shown to be the optimal model after a comparison of seven different models utilizing metrics such as accuracy, precision, recall, the F1 score, precision-recall average precision, the receiver operating characteristic-area under the curve, decision curve analysis, and the calibration curve. The most significant feature was cervical length, and the top ten features of second-trimester miscarriage were found using the SHAP technique based on relevance rankings.
Conclusion: The risk of a second-trimester miscarriage can be accurately predicted by the visual risk prediction model, which is based on the machine learning mentioned above.
期刊介绍:
BMC Pregnancy & Childbirth is an open access, peer-reviewed journal that considers articles on all aspects of pregnancy and childbirth. The journal welcomes submissions on the biomedical aspects of pregnancy, breastfeeding, labor, maternal health, maternity care, trends and sociological aspects of pregnancy and childbirth.