用动脉瘤性蛛网膜下腔出血后功能预后预测模型说明单项研究外部验证的陷阱。

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Jordi de Winkel, Carolien C H M Maas, Bob Roozenbeek, David van Klaveren, Hester F Lingsma
{"title":"用动脉瘤性蛛网膜下腔出血后功能预后预测模型说明单项研究外部验证的陷阱。","authors":"Jordi de Winkel, Carolien C H M Maas, Bob Roozenbeek, David van Klaveren, Hester F Lingsma","doi":"10.1186/s12874-024-02280-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients.</p><p><strong>Methods: </strong>We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell's c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I<sup>2</sup>-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a \"membership model\" to obtain a gross estimate of transportability.</p><p><strong>Results: </strong>Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70-0.78, range 0.52-0.84, I<sup>2</sup> = 0.92), the mean intercept was -0.06 (95%CI -0.37-0.24, range -1.40-0.75, I<sup>2</sup> = 0.97), and the mean slope was 0.96 (95%CI 0.78-1.13, range 0.53-1.31, I<sup>2</sup> = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model.</p><p><strong>Conclusions: </strong>We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308226/pdf/","citationCount":"0","resultStr":"{\"title\":\"Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage.\",\"authors\":\"Jordi de Winkel, Carolien C H M Maas, Bob Roozenbeek, David van Klaveren, Hester F Lingsma\",\"doi\":\"10.1186/s12874-024-02280-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients.</p><p><strong>Methods: </strong>We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell's c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I<sup>2</sup>-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a \\\"membership model\\\" to obtain a gross estimate of transportability.</p><p><strong>Results: </strong>Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70-0.78, range 0.52-0.84, I<sup>2</sup> = 0.92), the mean intercept was -0.06 (95%CI -0.37-0.24, range -1.40-0.75, I<sup>2</sup> = 0.97), and the mean slope was 0.96 (95%CI 0.78-1.13, range 0.53-1.31, I<sup>2</sup> = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model.</p><p><strong>Conclusions: </strong>We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.</p>\",\"PeriodicalId\":9114,\"journal\":{\"name\":\"BMC Medical Research Methodology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308226/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Research Methodology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12874-024-02280-9\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02280-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:预测模型通常通过单项研究或队列数据进行外部验证。然而,通过单项研究外部验证获得的性能估计值的解释并不像假设的那样简单。我们旨在通过对蛛网膜下腔出血(SAH)患者功能预后预测模型进行大量外部验证来说明这一点:我们利用蛛网膜下腔出血国际试验者(SAHIT)数据储存库中的数据(n = 11,931, 14 项研究),对 SAHIT 模型进行了改良,以便在出院后至少三个月内,用(扩展)格拉斯哥结果量表或改良兰金量表评分预测二分法功能预后(有利或不利)。我们进行了一次群组外交叉验证,以模拟多个单一研究的外部验证过程。每项研究代表一个群组。在每次验证中,我们都用哈雷尔 c 统计量评估区分度,用校准图、截距和斜率评估校准度。我们使用随机效应荟萃分析法获得(参考)平均性能估计值和研究间异质性(I2-统计量)。基于模型的 c 统计量评估了病例组合变化对判别性能的影响,我们还拟合了一个 "成员模型",以获得可迁移性的总估计值:结果:在 14 项单一研究的外部验证中,模型的性能差异很大。平均c统计量为0.74(95%CI为0.70-0.78,范围为0.52-0.84,I2=0.92),平均截距为-0.06(95%CI为-0.37-0.24,范围为-1.40-0.75,I2=0.97),平均斜率为0.96(95%CI为0.78-1.13,范围为0.53-1.31,I2=0.90)。判别性能的下降可归因于病例组合的变化、研究间的异质性或两者的结合。顺便提一下,我们观察到该模型的通用性或可移植性较差:结论:我们证明了通过单一研究外部验证解释模型性能的两个潜在误区。单项研究外部验证(1) 模型的性能变化很大,而且取决于验证数据的选择;(2) 无法深入了解模型的可推广性或可迁移性,而这正是指导本地实施所需要的。因此,单一研究的外部验证很容易被误解,导致对临床预测模型的错误评价。交叉验证更能解决这些问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage.

Background: Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients.

Methods: We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell's c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I2-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a "membership model" to obtain a gross estimate of transportability.

Results: Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70-0.78, range 0.52-0.84, I2 = 0.92), the mean intercept was -0.06 (95%CI -0.37-0.24, range -1.40-0.75, I2 = 0.97), and the mean slope was 0.96 (95%CI 0.78-1.13, range 0.53-1.31, I2 = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model.

Conclusions: We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信