Combining multiple imputation with internal model validation in clinical prediction modeling: a systematic methodological review

IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Sinclair Awounvo, Meinhard Kieser, Manuel Feißt
{"title":"Combining multiple imputation with internal model validation in clinical prediction modeling: a systematic methodological review","authors":"Sinclair Awounvo,&nbsp;Meinhard Kieser,&nbsp;Manuel Feißt","doi":"10.1016/j.jclinepi.2025.111916","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>We aim to investigate how multiple imputation (MI) is combined with internal model validation (IMV) in clinical prediction modeling (CPM) studies, with particular emphasis on the challenges of balancing predictive performance, methodological complexity, and resource demands.</div></div><div><h3>Study Design and Setting</h3><div>We searched PubMed, Web of Science, and MathSciNet for “multiple imputation” and “validation” and reviewed all CPM articles published until December 2023. Studies were categorized based on whether MI was performed before IMV (MI-prior-IMV) or during IMV (MI-during-IMV). Moreover, strategy choice was described in terms of key study parameters.</div></div><div><h3>Results</h3><div>Of 683 publications screened, 108 were included in the final analysis. MI-prior-IMV was applied in 85% of them. MI-during-IMV studies had larger sample sizes (2005 vs 1212) and higher missing rates (30% vs 28.5%) than MI-prior-IMV studies. The MI methods used were multiple imputation by chained equations in 77 (92%) and Markov-Chain-Monte-Carlo (MCMC) in 7 (8%) of 84 studies. MI-during-IMV studies exclusively used MICE for MI. MI-during-IMV studies performed fewer imputations compared to MI-prior-IMV studies (10 vs 15). Moreover, MI-during-IMV studies mostly opted for sample-split (SS; 50%) followed by cross-validation (CV; 31%) and bootstrap (BS; 19%) as IMV methods. In contrast, MI-during-IMV studies mostly applied BS (63%) followed by CV (24%) and SS (13%).</div></div><div><h3>Conclusion</h3><div>MI-prior-IMV is predominantly applied over MI-during-IMV, probably due to its relative simplicity regarding comprehension and implementation. MI-during-IMV studies involve larger sample sizes and higher missing rates, potentially explaining their conservativeness regarding runtime and complexity. Future studies should systematically evaluate the complexity-benefits trade-offs of different strategies, offering clearer guidance on optimal strategies for various settings.</div></div><div><h3>Plain Language Summary</h3><div>Missing data are a common challenge in clinical research and can affect the development and validation of prediction models used for patient care. MI is a statistical technique widely used to address missing data by creating multiple complete datasets. IMV methods, such as SS, CV, and bootstrapping, are also essential to ensure the reliability of prediction models when external data are not available. However, there is limited guidance on how to combine these two techniques effectively. This review systematically examined published studies to understand how researchers combine MI and IMV in clinical prediction modeling. We analyzed 108 studies and found that most researchers performed MI before IMV (85% of studies). This approach, referred to as MI-prior-IMV, is simpler and easier to implement. In contrast, a smaller number of studies performed MI-during-IMV, which may provide more accurate results but is more complex and resource-intensive. The review also identified that studies using the MI-during-IMV approach tended to have larger sample sizes and higher levels of missing data. These studies often chose simpler validation methods to manage the increased complexity and runtime. Overall, the findings highlight a trade-off between simplicity and potential accuracy when combining MI with IMV. Future research is needed to evaluate the benefits and limitations of these approaches in different contexts. This review provides a foundation for developing clearer guidelines to help researchers choose the most appropriate strategy for combining MI and IMV in clinical prediction modeling.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"186 ","pages":"Article 111916"},"PeriodicalIF":5.2000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625002495","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

We aim to investigate how multiple imputation (MI) is combined with internal model validation (IMV) in clinical prediction modeling (CPM) studies, with particular emphasis on the challenges of balancing predictive performance, methodological complexity, and resource demands.

Study Design and Setting

We searched PubMed, Web of Science, and MathSciNet for “multiple imputation” and “validation” and reviewed all CPM articles published until December 2023. Studies were categorized based on whether MI was performed before IMV (MI-prior-IMV) or during IMV (MI-during-IMV). Moreover, strategy choice was described in terms of key study parameters.

Results

Of 683 publications screened, 108 were included in the final analysis. MI-prior-IMV was applied in 85% of them. MI-during-IMV studies had larger sample sizes (2005 vs 1212) and higher missing rates (30% vs 28.5%) than MI-prior-IMV studies. The MI methods used were multiple imputation by chained equations in 77 (92%) and Markov-Chain-Monte-Carlo (MCMC) in 7 (8%) of 84 studies. MI-during-IMV studies exclusively used MICE for MI. MI-during-IMV studies performed fewer imputations compared to MI-prior-IMV studies (10 vs 15). Moreover, MI-during-IMV studies mostly opted for sample-split (SS; 50%) followed by cross-validation (CV; 31%) and bootstrap (BS; 19%) as IMV methods. In contrast, MI-during-IMV studies mostly applied BS (63%) followed by CV (24%) and SS (13%).

Conclusion

MI-prior-IMV is predominantly applied over MI-during-IMV, probably due to its relative simplicity regarding comprehension and implementation. MI-during-IMV studies involve larger sample sizes and higher missing rates, potentially explaining their conservativeness regarding runtime and complexity. Future studies should systematically evaluate the complexity-benefits trade-offs of different strategies, offering clearer guidance on optimal strategies for various settings.

Plain Language Summary

Missing data are a common challenge in clinical research and can affect the development and validation of prediction models used for patient care. MI is a statistical technique widely used to address missing data by creating multiple complete datasets. IMV methods, such as SS, CV, and bootstrapping, are also essential to ensure the reliability of prediction models when external data are not available. However, there is limited guidance on how to combine these two techniques effectively. This review systematically examined published studies to understand how researchers combine MI and IMV in clinical prediction modeling. We analyzed 108 studies and found that most researchers performed MI before IMV (85% of studies). This approach, referred to as MI-prior-IMV, is simpler and easier to implement. In contrast, a smaller number of studies performed MI-during-IMV, which may provide more accurate results but is more complex and resource-intensive. The review also identified that studies using the MI-during-IMV approach tended to have larger sample sizes and higher levels of missing data. These studies often chose simpler validation methods to manage the increased complexity and runtime. Overall, the findings highlight a trade-off between simplicity and potential accuracy when combining MI with IMV. Future research is needed to evaluate the benefits and limitations of these approaches in different contexts. This review provides a foundation for developing clearer guidelines to help researchers choose the most appropriate strategy for combining MI and IMV in clinical prediction modeling.

Abstract Image

结合多重输入与内部模型验证在临床预测建模:一个系统的方法学回顾。
目的:我们旨在研究临床预测建模(CPM)研究中多重归算(MI)与内部模型验证(IMV)如何结合,特别强调平衡预测性能、方法复杂性和资源需求的挑战。研究设计和设置:我们检索了PubMed、Web of Science和MathSciNet的“多重imputation”和“validation”,并回顾了2023年12月之前发表的所有CPM文章。研究根据是否在IMV之前(MI-prior-IMV)或在IMV期间(MI-during-IMV)进行MI分类。此外,从关键研究参数的角度对策略选择进行了描述。结果:在筛选的683篇文献中,有108篇被纳入最终分析。其中85%的患者采用mi - pre - imv。imv期间mi研究的样本量更大(2005 vs. 1212),缺失率更高(30% vs. 28.5%)。84项研究中,77项(92%)采用MICE方法,7项(8%)采用MCMC方法。IMV - mds研究只使用MCMC来治疗mds。与IMV - mds之前的研究相比,IMV - mds期间的研究进行了更少的归算(10比15)。此外,IMV期间的mi研究大多选择样本分割(50%),其次是交叉验证(31%)和bootstrap(19%)作为IMV方法。相比之下,imv期间的mi研究大多采用自举法(63%),其次是交叉验证(24%)和样本分割(13%)。结论:MI-prior-IMV比MI-during-IMV应用更广泛,可能是由于其在理解和实施上相对简单。imv期间的mi研究涉及更大的样本量和更高的缺失率,这可能解释了它们在运行时间和复杂性方面的保守性。未来的研究应系统地评估不同策略的复杂性和效益权衡,为各种环境下的最优策略提供更清晰的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Clinical Epidemiology
Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
12.00
自引率
6.90%
发文量
320
审稿时长
44 days
期刊介绍: The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信