使用总体或个人参与者数据的预测模型的网络元分析-范围审查和报告和行为的建议。

IF 5.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology Pub Date : 2025-10-03 DOI:10.1016/j.jclinepi.2025.112006

Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong

{"title":"使用总体或个人参与者数据的预测模型的网络元分析-范围审查和报告和行为的建议。","authors":"Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong","doi":"10.1016/j.jclinepi.2025.112006","DOIUrl":null,"url":null,"abstract":"Background: Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.Objective: To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.Methods: We searched PubMed and Embase up to 1st September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.Results: After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.Conclusions: This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.Plain language summary: Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112006"},"PeriodicalIF":5.2000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Network meta-analysis of prediction models using aggregate or individual participant data - A scoping review and recommendations for reporting and conduct.\",\"authors\":\"Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong\",\"doi\":\"10.1016/j.jclinepi.2025.112006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.Objective: To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.Methods: We searched PubMed and Embase up to 1st September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.Results: After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.Conclusions: This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.Plain language summary: Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\" \",\"pages\":\"112006\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jclinepi.2025.112006\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jclinepi.2025.112006","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：预测模型在临床决策中对于估计当前（诊断、筛查）或未来（预后）结果的概率至关重要。网络元分析（NMA）是同时比较多个预测模型性能的有力工具。然而，对于使用NMA评估预测模型的研究，几乎没有任何关于方法和报告的指导。目的：概述nma评估预测模型（外部验证）性能的方法，无论它们是使用聚合数据（AD）还是个体参与者数据（IPD）。此外，我们还提出了改进nma在预测模型研究中的报告和行为的建议。方法：我们检索到2025年9月1日之前的PubMed和Embase，以确定使用NMA评估诊断或预后预测模型性能的研究。我们纳入了使用NMA来比较和评估至少三种预测模型的文章。我们根据应用（诊断与预后）、数据使用（AD与IPD）、评估模型的医学背景和应用的评估指标（如鉴别、校准和（重新）分类）对已确定的研究进行了总结。此外，我们还研究了采用的统计方法、NMA假设（如一致性、传递性和互换性）以及用于模型比较的排名方法。结果：筛选2436篇，纳入28篇。26项研究（92.9%）使用AD， 2项研究（7.1%）使用IPD。医院护理是最常见的环境（n = 22; 78.6%），呼吸科（n = 7; 25.0%）和心脏病学（n = 5; 17.9%）是最常研究的临床领域。28个NMA对关键NMA假设的处理方式不同：14.3% （n = 4）讨论了传递性、相似性或互换性，53.6% （n = 15）测试了一致性。统计方法也各不相同，60.7%的研究（n=17）采用贝叶斯方法，17.9% （n=5）采用频率方法。supra （Surface Under the Cumulative Ranking）是主要的排序方法（n = 18）。大多数nma在网络中包含5-10个模型，五个nma分析超过20个模型。性能指标各不相同，39.3%的研究（n = 11）报告了歧视措施，如C统计，而没有报告校准指标。64.3%的研究（n = 18）提供了敏感性或特异性，没有文章报道高级决策分析指标，如决策曲线分析。结论：这一范围综述强调了NMA方法在评估预测模型中的局限性和多样性，主要依赖于总体而不是个体参与者数据，并且对关键NMA假设和模型性能指标的考虑不一致。我们为预测模型验证性能的NMA报告和执行提供了建议。简单的语言总结：预测模型是医学上用来估计病人患病风险或经历健康结果的工具。对于同一种疾病，存在许多不同的预测模型，医生和研究人员很难知道哪种模型表现最好。比较多个模型的一种方法是通过一种称为网络元分析（NMA）的统计方法，这种方法通常用于比较治疗，但很少应用于预测模型。在我们的研究中，我们回顾了所有已发表的评估预测模型的nma，以了解它们是如何进行和报告的。我们研究了这些研究是否报告了重要的表现指标，例如模型如何区分有结果和没有结果的患者（歧视），以及预测与实际结果的匹配程度（校准）。我们还检查了是否考虑了关键的NMA假设以及如何进行分析。我们发现大多数研究使用汇总数据而不是患者水平的数据。许多公司没有报告关键的绩效指标，也很少检查NMA假设。在分析模型的过程中，透明度也很有限，这使得其他人很难重现结果。我们的研究结果表明，虽然NMA在帮助比较预测模型和确定最可靠的预测模型方面具有很大的潜力，但目前的实践往往缺乏使这些比较完全可信所需的详细报告。我们建议更好地报告，共享数据和分析代码，并仔细检查假设，以帮助研究人员和医生选择最可靠的模型，最终改善患者护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Network meta-analysis of prediction models using aggregate or individual participant data - A scoping review and recommendations for reporting and conduct.

Background: Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.

Objective: To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.

Methods: We searched PubMed and Embase up to 1^st September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.

Results: After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.

Conclusions: This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.

Plain language summary: Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.