Clustered flexible calibration plots for binary outcomes using random effects modeling.

IF 6.1 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Research Synthesis Methods Pub Date : 2026-05-01 Epub Date: 2025-12-29 DOI:10.1017/rsm.2025.10046

Lasai Barreñada, Bavo De Cock Campo, Laure Wynants, Ben Van Calster

{"title":"Clustered flexible calibration plots for binary outcomes using random effects modeling.","authors":"Lasai Barreñada, Bavo De Cock Campo, Laure Wynants, Ben Van Calster","doi":"10.1017/rsm.2025.10046","DOIUrl":null,"url":null,"abstract":"Evaluation of clinical prediction models across multiple clusters, whether centers or datasets, is becoming increasingly common. A comprehensive evaluation includes an assessment of the agreement between the estimated risks and the observed outcomes, also known as calibration. Calibration is of utmost importance for clinical decision making with prediction models, and it often varies between clusters. We present three approaches to take clustering into account when evaluating calibration: (1) clustered group calibration (CG-C), (2) two-stage meta-analysis calibration (2MA-C), and (3) mixed model calibration (MIX-C), which can obtain flexible calibration plots with random effects modeling and provide confidence interval (CI) and prediction interval (PI). As a case example, we externally validate a model to estimate the risk that an ovarian tumor is malignant in multiple centers (N = 2489). We also conduct a simulation study and a synthetic data study generated from a true clustered dataset to evaluate the methods. In the simulation study, MIX-C and 2MA-C (splines) gave estimated curves closest to the true overall curve. In the synthetic data study, MIX-C produced cluster-specific curves closest to the truth. Coverage of the PI across the plot was best for 2MA-C with splines. We recommend using 2MA-C with splines to estimate the overall curve and 95% PI and MIX-C for cluster-specific curves, especially when the sample size per cluster is limited. We provide ready-to-use code to construct summary flexible calibration curves, with CI and PI to assess heterogeneity in calibration across datasets or centers.","PeriodicalId":226,"journal":{"name":"Research Synthesis Methods","volume":"17 3","pages":"567-588"},"PeriodicalIF":6.1000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13126218/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Synthesis Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1017/rsm.2025.10046","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Evaluation of clinical prediction models across multiple clusters, whether centers or datasets, is becoming increasingly common. A comprehensive evaluation includes an assessment of the agreement between the estimated risks and the observed outcomes, also known as calibration. Calibration is of utmost importance for clinical decision making with prediction models, and it often varies between clusters. We present three approaches to take clustering into account when evaluating calibration: (1) clustered group calibration (CG-C), (2) two-stage meta-analysis calibration (2MA-C), and (3) mixed model calibration (MIX-C), which can obtain flexible calibration plots with random effects modeling and provide confidence interval (CI) and prediction interval (PI). As a case example, we externally validate a model to estimate the risk that an ovarian tumor is malignant in multiple centers (N = 2489). We also conduct a simulation study and a synthetic data study generated from a true clustered dataset to evaluate the methods. In the simulation study, MIX-C and 2MA-C (splines) gave estimated curves closest to the true overall curve. In the synthetic data study, MIX-C produced cluster-specific curves closest to the truth. Coverage of the PI across the plot was best for 2MA-C with splines. We recommend using 2MA-C with splines to estimate the overall curve and 95% PI and MIX-C for cluster-specific curves, especially when the sample size per cluster is limited. We provide ready-to-use code to construct summary flexible calibration curves, with CI and PI to assess heterogeneity in calibration across datasets or centers.

查看原文本刊更多论文

利用随机效应建模对二值结果的灵活校准图进行聚类。

跨多个聚类（无论是中心还是数据集）的临床预测模型评估正变得越来越普遍。综合评价包括评估估计的风险和观察到的结果之间的一致性，也称为校准。校准是最重要的临床决策与预测模型，它往往不同的集群。我们提出了三种在评估校准时考虑聚类的方法：(1)聚类组校准（CG-C），(2)两阶段元分析校准（2MA-C）和(3)混合模型校准（MIX-C），该方法可以通过随机效应建模获得灵活的校准图，并提供置信区间（CI）和预测区间（PI）。作为一个案例，我们从外部验证了一个模型来估计卵巢肿瘤在多个中心是恶性的风险（N = 2489）。我们还进行了模拟研究和从真实聚类数据集生成的合成数据研究来评估这些方法。在模拟研究中，MIX-C和2MA-C（样条）给出了最接近真实总体曲线的估计曲线。在合成数据研究中，MIX-C生成了最接近事实的特定于集群的曲线。带样条曲线的2MA-C对PI的覆盖效果最好。我们建议使用带样条的2MA-C来估计整体曲线，使用95% PI和MIX-C来估计特定于集群的曲线，特别是当每个集群的样本量有限时。我们提供现成的代码来构建汇总灵活的校准曲线，使用CI和PI来评估跨数据集或中心校准的异质性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Synthesis Methods MATHEMATICAL & COMPUTATIONAL BIOLOGYMULTID-MULTIDISCIPLINARY SCIENCES

CiteScore

16.90

自引率

3.10%

发文量

期刊介绍： Research Synthesis Methods is a reputable, peer-reviewed journal that focuses on the development and dissemination of methods for conducting systematic research synthesis. Our aim is to advance the knowledge and application of research synthesis methods across various disciplines. Our journal provides a platform for the exchange of ideas and knowledge related to designing, conducting, analyzing, interpreting, reporting, and applying research synthesis. While research synthesis is commonly practiced in the health and social sciences, our journal also welcomes contributions from other fields to enrich the methodologies employed in research synthesis across scientific disciplines. By bridging different disciplines, we aim to foster collaboration and cross-fertilization of ideas, ultimately enhancing the quality and effectiveness of research synthesis methods. Whether you are a researcher, practitioner, or stakeholder involved in research synthesis, our journal strives to offer valuable insights and practical guidance for your work.