Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong
{"title":"使用总体或个人参与者数据的预测模型的网络元分析-范围审查和报告和行为的建议。","authors":"Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong","doi":"10.1016/j.jclinepi.2025.112006","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.</p><p><strong>Objective: </strong>To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.</p><p><strong>Methods: </strong>We searched PubMed and Embase up to 1<sup>st</sup> September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.</p><p><strong>Results: </strong>After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.</p><p><strong>Conclusions: </strong>This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.</p><p><strong>Plain language summary: </strong>Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"112006"},"PeriodicalIF":5.2000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Network meta-analysis of prediction models using aggregate or individual participant data - A scoping review and recommendations for reporting and conduct.\",\"authors\":\"Maerziya Yusufujiang, Johanna A A Damen, Demy L Idema, Ewoud Schuit, Karel G M Moons, Valentijn M T de Jong\",\"doi\":\"10.1016/j.jclinepi.2025.112006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.</p><p><strong>Objective: </strong>To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.</p><p><strong>Methods: </strong>We searched PubMed and Embase up to 1<sup>st</sup> September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.</p><p><strong>Results: </strong>After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.</p><p><strong>Conclusions: </strong>This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.</p><p><strong>Plain language summary: </strong>Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.</p>\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\" \",\"pages\":\"112006\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jclinepi.2025.112006\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jclinepi.2025.112006","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Network meta-analysis of prediction models using aggregate or individual participant data - A scoping review and recommendations for reporting and conduct.
Background: Prediction models are essential in clinical decision-making for estimating the probability of current (diagnosis, screening) or future (prognosis) outcomes. Network meta-analysis (NMA) serves as a powerful tool to compare the performance of multiple prediction models simultaneously. However, there is hardly any guidance on methods and reporting for studies employing NMA to evaluate prediction models.
Objective: To provide an overview of NMAs assessing prediction model (external validation) performance, regardless of whether they use aggregate data (AD) or individual participant data (IPD). Additionally, we offer recommendations for improving the reporting and conduct of NMAs in prediction model research.
Methods: We searched PubMed and Embase up to 1st September 2025 to identify studies that addressed the evaluation of diagnostic or prognostic prediction model performance using NMA. We included articles that employed NMA to compare and assess at least three prediction models. We summarized the identified studies based on, e.g., their application (diagnostic vs. prognostic), data use (AD vs. IPD), medical contexts in which the models were assessed, and evaluation metrics applied (e.g., discrimination, calibration, and (re)classification). Additionally, we examined the statistical approaches employed, the NMA assumptions (such as consistency, transitivity, and exchangeability), and the ranking methods used for model comparison.
Results: After screening 2,436 articles, 28 were included. Twenty-six studies (92.9%) used AD, while two (7.1%) used IPD. Hospital care was the most common setting (n = 22; 78.6%), with respirology (n = 7; 25.0%) and cardiology (n = 5; 17.9%) as the most frequently studied clinical domains. Key NMA assumptions were addressed differently across the 28 NMAs: 14.3% (n = 4) discussed transitivity, similarity, or exchangeability, and 53.6% (n = 15) tested for consistency. The statistical approach also varied, with 60.7% of studies (n=17) reporting Bayesian methods and 17.9% (n=5) reporting frequentist approaches. SUCRA (Surface Under the Cumulative Ranking) was the predominant ranking method (n = 18). Most NMAs included 5-10 models in the network, with five NMAs analyzing more than 20 models. Performance metrics varied, with 39.3% of studies (n = 11) reporting discrimination measures, such as C statistics, while none reported calibration metrics. Sensitivity or specificity was provided in 64.3% of studies (n = 18), and no articles reported advanced decision-analytic metrics like Decision Curve Analysis.
Conclusions: This scoping review highlights the limited and diverse use of NMA methods in evaluating prediction models, with a predominant reliance on aggregate rather than individual participant data, and inconsistent consideration of key NMA assumptions and model performance metrics. We provide recommendations for the reporting and conduct of an NMA of prediction model validation performance.
Plain language summary: Prediction models are tools used in medicine to estimate a patient's risk of developing a disease or experiencing a health outcome. Many different prediction models exist for the same condition, and it can be difficult for doctors and researchers to know which model performs best. One way to compare multiple models is through a statistical method called network meta-analysis (NMA), which is commonly used to compare treatments but has rarely been applied to prediction models. In our study, we reviewed all published NMAs that evaluated prediction models to see how they were conducted and reported. We looked at whether the studies reported important performance measures, such as how well the models could distinguish between patients with and without the outcome (discrimination), and how well the predictions matched actual outcomes (calibration). We also checked if key NMA assumptions were considered and how analyses were conducted. We found that most studies used summary data instead of patient-level data. Many did not report crucial performance measures and rarely checked NMA assumptions. There was also limited transparency in how the models were analyzed, making it difficult for others to reproduce the results. Our findings show that while NMA has great potential to help compare prediction models and identify the most reliable ones, current practice often lacks the detailed reporting needed to make these comparisons fully trustworthy. We recommend better reporting, sharing of data and analysis code, and careful checking of assumptions to help researchers and doctors choose the most reliable models, ultimately improving patient care.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.