Comparative analysis of convolutional neural networks and traditional machine learning models for IVF live birth prediction: a retrospective analysis of 48514 IVF cycles and an evaluation of deployment feasibility in resource-constrained settings.

IF 3.9 2区医学 Q2 ENDOCRINOLOGY & METABOLISM

Frontiers in Endocrinology Pub Date : 2025-06-12 eCollection Date: 2025-01-01 DOI:10.3389/fendo.2025.1556681

Yu Liu, Yi Wang, Kai Huang, Hao Shi, Hang Xin, Shanjun Dai, Jinhao Liu, Xinhong Yang, Jianyuan Song, Fuli Zhang, Yihong Guo

{"title":"Comparative analysis of convolutional neural networks and traditional machine learning models for IVF live birth prediction: a retrospective analysis of 48514 IVF cycles and an evaluation of deployment feasibility in resource-constrained settings.","authors":"Yu Liu, Yi Wang, Kai Huang, Hao Shi, Hang Xin, Shanjun Dai, Jinhao Liu, Xinhong Yang, Jianyuan Song, Fuli Zhang, Yihong Guo","doi":"10.3389/fendo.2025.1556681","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate the predictive performance of a convolutional neural network for analyzing electronic medical records in assisted reproductive therapy and to compare its accuracy and interpretability with traditional machine learning models. The study also explores the feasibility of deploying such models in resource-limited clinical settings.Design: Retrospective cohort study based on EMR data using five models: CNN, Naïve Bayes, Random Forest, Decision Tree, and Feedforward Neural Network. Feature importance and model interpretability were evaluated using SHAP.Setting: First Hospital of Zhengzhou University.Population: 48,514 fresh IVF cycles from August 2009 to May 2018.Methods: Preprocessed EMR data were used to train and evaluate five classification models predicting live birth outcomes. Stratified 5-fold cross-validation was performed for robust performance estimation. ROC curves and AUC values were used for comparative evaluation.Main outcome measure: Live birth.Results: The CNN model achieved an accuracy of 0.9394 ± 0.0013, AUC of 0.8899 ± 0.0032, precision of 0.9348 ± 0.0018, recall of 0.9993 ± 0.0012, and F1 score of 0.9660 ± 0.0007. Its performance was comparable to Random Forest (accuracy: 0.9406 ± 0.0017, AUC: 0.9734 ± 0.0012), and superior to Decision Tree, Naïve Bayes, and Feedforward Neural Network in recall and robustness. CNN demonstrated stable convergence during training, and SHAP-based interpretation highlighted maternal age, BMI, antral follicle count, and gonadotropin dosage as the top predictors for live birth outcome.Conclusions: With appropriate input transformation, CNNs can effectively model structured EMR data and offer predictive performance comparable to ensemble methods. Their scalability, high sensitivity, and interpretability make CNNs promising candidates for integration into clinical workflows, particularly in environments with limited computational resources.","PeriodicalId":12447,"journal":{"name":"Frontiers in Endocrinology","volume":"16 ","pages":"1556681"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12197960/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Endocrinology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fendo.2025.1556681","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To evaluate the predictive performance of a convolutional neural network for analyzing electronic medical records in assisted reproductive therapy and to compare its accuracy and interpretability with traditional machine learning models. The study also explores the feasibility of deploying such models in resource-limited clinical settings.

Design: Retrospective cohort study based on EMR data using five models: CNN, Naïve Bayes, Random Forest, Decision Tree, and Feedforward Neural Network. Feature importance and model interpretability were evaluated using SHAP.

Setting: First Hospital of Zhengzhou University.

Population: 48,514 fresh IVF cycles from August 2009 to May 2018.

Methods: Preprocessed EMR data were used to train and evaluate five classification models predicting live birth outcomes. Stratified 5-fold cross-validation was performed for robust performance estimation. ROC curves and AUC values were used for comparative evaluation.

Main outcome measure: Live birth.

Results: The CNN model achieved an accuracy of 0.9394 ± 0.0013, AUC of 0.8899 ± 0.0032, precision of 0.9348 ± 0.0018, recall of 0.9993 ± 0.0012, and F1 score of 0.9660 ± 0.0007. Its performance was comparable to Random Forest (accuracy: 0.9406 ± 0.0017, AUC: 0.9734 ± 0.0012), and superior to Decision Tree, Naïve Bayes, and Feedforward Neural Network in recall and robustness. CNN demonstrated stable convergence during training, and SHAP-based interpretation highlighted maternal age, BMI, antral follicle count, and gonadotropin dosage as the top predictors for live birth outcome.

Conclusions: With appropriate input transformation, CNNs can effectively model structured EMR data and offer predictive performance comparable to ensemble methods. Their scalability, high sensitivity, and interpretability make CNNs promising candidates for integration into clinical workflows, particularly in environments with limited computational resources.

查看原文本刊更多论文

卷积神经网络与传统机器学习模型在试管婴儿活产预测中的比较分析：48514个试管婴儿周期的回顾性分析和资源受限环境下部署可行性的评估。

目的：评价卷积神经网络在辅助生殖治疗电子病历分析中的预测性能，并与传统机器学习模型比较其准确性和可解释性。该研究还探讨了在资源有限的临床环境中部署这种模型的可行性。设计：基于EMR数据的回顾性队列研究，采用CNN、Naïve贝叶斯、随机森林、决策树和前馈神经网络五种模型。使用SHAP对特征重要性和模型可解释性进行评估。单位：郑州大学第一医院。人口：从2009年8月到2018年5月，48,514个新鲜试管婴儿周期。方法：使用预处理后的EMR数据训练和评估预测活产结局的五种分类模型。分层5重交叉验证进行稳健的性能估计。采用ROC曲线和AUC值进行比较评价。主要结局指标：活产。结果：CNN模型准确率为0.9394±0.0013，AUC为0.8899±0.0032，精密度为0.9348±0.0018，召回率为0.9993±0.0012，F1评分为0.9660±0.0007。其性能与随机森林相当（准确率：0.9406±0.0017,AUC: 0.9734±0.0012），在召回率和鲁棒性方面优于决策树、Naïve贝叶斯和前馈神经网络。CNN在训练过程中表现出稳定的收敛性，基于shap的解释强调了产妇年龄、BMI、窦卵泡计数和促性腺激素剂量是活产结局的主要预测因子。结论：通过适当的输入变换，cnn可以有效地对结构化EMR数据进行建模，并提供与集成方法相当的预测性能。它们的可扩展性、高灵敏度和可解释性使cnn成为整合到临床工作流程中的有希望的候选者，特别是在计算资源有限的环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Endocrinology Medicine-Endocrinology, Diabetes and Metabolism

CiteScore

5.70

自引率

9.60%

发文量

3023

审稿时长

14 weeks

期刊介绍： Frontiers in Endocrinology is a field journal of the "Frontiers in" journal series. In today’s world, endocrinology is becoming increasingly important as it underlies many of the challenges societies face - from obesity and diabetes to reproduction, population control and aging. Endocrinology covers a broad field from basic molecular and cellular communication through to clinical care and some of the most crucial public health issues. The journal, thus, welcomes outstanding contributions in any domain of endocrinology. Frontiers in Endocrinology publishes articles on the most outstanding discoveries across a wide research spectrum of Endocrinology. The mission of Frontiers in Endocrinology is to bring all relevant Endocrinology areas together on a single platform.