Predicting IVF live birth probabilities using machine learning, center-specific and national registry-based models

Elizabeth T. Nguyen, Matthew G. Retzloff, Laura April Gago, John E. Nichols, John F. Payne, Barry A. Ripps, Michael Opsahl, Jeremy Groll, Ronald Beesley, Lorie Nowak, Gregory Neal, Jaye Adams, Trevor Swanson, Xiaocong Chen, Mylene W. M. Yao
{"title":"Predicting IVF live birth probabilities using machine learning, center-specific and national registry-based models","authors":"Elizabeth T. Nguyen, Matthew G. Retzloff, Laura April Gago, John E. Nichols, John F. Payne, Barry A. Ripps, Michael Opsahl, Jeremy Groll, Ronald Beesley, Lorie Nowak, Gregory Neal, Jaye Adams, Trevor Swanson, Xiaocong Chen, Mylene W. M. Yao","doi":"10.1101/2024.06.20.24308970","DOIUrl":null,"url":null,"abstract":"Objective:\nTo compare the performance of machine learning based, center-specific (MLCS) models and the US national registry-based, multicenter model (SART model) in predicting IVF live birth probabilities (LBPs) for 6 unrelated, geographically diverse US fertility centers. Design:\nRetrospective observational design. Subjects:\nTest sets comprised first IVF cycle data (2013-2022) extracted from a retrospective cohort of 4,645 patients at 6 fertility centers. Intervention or Exposure:\nThe initial (MLCS1) and updated (MLCS2) models were compared against age control. MLSC2 and SART models were compared. Main Outcome Measures:\nModel validation metrics, reported in median and interquartile range (IQR), were compared using Wilcoxon signed-rank test: ROC AUC, posterior log-likelihood of odds ratio compared to age (PLORA), Precision-Recall (PR) AUC, F1 score and continuous net reclassification improvement (NRI). Results:\nMLCS1 and MLCS2 models showed improved AUC and PLORA compared to age control; MLCS1 models were validated using out-of-time test data. MLCS2 models showed improved PLORA 23.9 (IQR 10.2, 39.4) compared to 7.2 (IQR 3.6, 11.8) for MLCS1, p<0.05. MLCS2 showed higher median PR AUC at 0.75 (IQR 0.73, 0.77) compared to 0.69 (IQR 0.68, 0.71) for SART, p<0.05. In addition, the median F1 Score was higher for MLCS2 compared to SART model across predicted live birth probability (LBP) thresholds sampled at deciles at ≥40%, ≥50%, ≥60%, ≥70%. For example, at the 50% LBP threshold, MLCS2 had a median F1 score of 0.74 (IQR 0.72, 0.78) compared to 0.71 (IQR 0.68, 0.73) for SART. At these six centers, using the LBP threshold of ≥ 50%, MLCS2 models can identify ~84% of patients who would go on to have IVF live births, while the SART model can only identify ~75%. That means for every 100 patients who will have a first IVF cycle live birth, using LBR ≥ 50% as threshold, the MLCS2 model can identify 9 more such patients without overcalling or overestimating LBPs compared to the SART model. Conclusion:\nMLCS models accurately assign higher IVF LBPs to more patients compared to the SART model at 6 US fertility centers. We recommend testing a larger sample of fertility centers to evaluate generalizability of MLCS model benefits.","PeriodicalId":501409,"journal":{"name":"medRxiv - Obstetrics and Gynecology","volume":"67 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Obstetrics and Gynecology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.20.24308970","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To compare the performance of machine learning based, center-specific (MLCS) models and the US national registry-based, multicenter model (SART model) in predicting IVF live birth probabilities (LBPs) for 6 unrelated, geographically diverse US fertility centers. Design: Retrospective observational design. Subjects: Test sets comprised first IVF cycle data (2013-2022) extracted from a retrospective cohort of 4,645 patients at 6 fertility centers. Intervention or Exposure: The initial (MLCS1) and updated (MLCS2) models were compared against age control. MLSC2 and SART models were compared. Main Outcome Measures: Model validation metrics, reported in median and interquartile range (IQR), were compared using Wilcoxon signed-rank test: ROC AUC, posterior log-likelihood of odds ratio compared to age (PLORA), Precision-Recall (PR) AUC, F1 score and continuous net reclassification improvement (NRI). Results: MLCS1 and MLCS2 models showed improved AUC and PLORA compared to age control; MLCS1 models were validated using out-of-time test data. MLCS2 models showed improved PLORA 23.9 (IQR 10.2, 39.4) compared to 7.2 (IQR 3.6, 11.8) for MLCS1, p<0.05. MLCS2 showed higher median PR AUC at 0.75 (IQR 0.73, 0.77) compared to 0.69 (IQR 0.68, 0.71) for SART, p<0.05. In addition, the median F1 Score was higher for MLCS2 compared to SART model across predicted live birth probability (LBP) thresholds sampled at deciles at ≥40%, ≥50%, ≥60%, ≥70%. For example, at the 50% LBP threshold, MLCS2 had a median F1 score of 0.74 (IQR 0.72, 0.78) compared to 0.71 (IQR 0.68, 0.73) for SART. At these six centers, using the LBP threshold of ≥ 50%, MLCS2 models can identify ~84% of patients who would go on to have IVF live births, while the SART model can only identify ~75%. That means for every 100 patients who will have a first IVF cycle live birth, using LBR ≥ 50% as threshold, the MLCS2 model can identify 9 more such patients without overcalling or overestimating LBPs compared to the SART model. Conclusion: MLCS models accurately assign higher IVF LBPs to more patients compared to the SART model at 6 US fertility centers. We recommend testing a larger sample of fertility centers to evaluate generalizability of MLCS model benefits.
利用机器学习、特定中心和基于国家登记册的模型预测试管婴儿活产概率
目的:比较基于机器学习的特定中心模型(MLCS)和基于美国国家登记处的多中心模型(SART 模型)在预测 6 个无关联、地理位置不同的美国生殖中心的试管婴儿活产概率(LBPs)方面的性能。设计:回顾性观察设计。受试者:测试集包括从 6 家生殖中心 4645 名患者的回顾性队列中提取的首个试管婴儿周期数据(2013-2022 年)。干预或暴露:初始模型(MLCS1)和更新模型(MLCS2)与年龄对照进行了比较。比较了MLSC2和SART模型。主要结果指标:模型验证指标(以中位数和四分位数间距(IQR)表示)采用Wilcoxon符号秩检验进行比较:ROC AUC、与年龄相比的几率比后验对数似然比(PLORA)、精确度-召回(PR)AUC、F1得分和连续净再分类改善(NRI)。结果:与年龄对照相比,MLCS1 和 MLCS2 模型的 AUC 和 PLORA 均有所提高;MLCS1 模型使用时间外测试数据进行了验证。与 MLCS1 的 7.2(IQR 3.6,11.8)相比,MLCS2 模型的 PLORA 提高了 23.9(IQR 10.2,39.4),p<0.05。MLCS2 的中位 PR AUC 为 0.75(IQR 0.73,0.77),高于 SART 的 0.69(IQR 0.68,0.71),p<0.05。此外,在预测活产概率(LBP)阈值≥40%、≥50%、≥60%、≥70%的十等分抽样中,MLCS2 的中位 F1 得分高于 SART 模型。例如,在 50% LBP 阈值时,MLCS2 的中位 F1 得分为 0.74(IQR 0.72,0.78),而 SART 的中位 F1 得分为 0.71(IQR 0.68,0.73)。在这六个中心,使用≥50%的LBP阈值,MLCS2模型可以识别约84%的患者将继续进行试管婴儿活产,而SART模型只能识别约75%的患者。这意味着,以LBR≥50%为阈值,每100名会在首个试管婴儿周期活产的患者中,MLCS2模型能比SART模型多识别出9名这样的患者,而不会过高或高估LBPs。结论:在美国的 6 家生殖中心,与 SART 模型相比,MLCS 模型能准确地为更多患者分配更高的 IVF LBP。我们建议对更多的生育中心样本进行测试,以评估MLCS模型效益的普遍性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信