基于聚类网络信息准则的快速留一簇出交叉验证。

IF 1.9 3区医学 Q3 HEALTH CARE SCIENCES & SERVICES

Statistical Methods in Medical Research Pub Date : 2025-07-01 Epub Date: 2025-06-19 DOI:10.1177/09622802251345486

Jiaxing Qiu, Douglas E Lake, Pavel Chernyavskiy, Teague R Henry

{"title":"基于聚类网络信息准则的快速留一簇出交叉验证。","authors":"Jiaxing Qiu, Douglas E Lake, Pavel Chernyavskiy, Teague R Henry","doi":"10.1177/09622802251345486","DOIUrl":null,"url":null,"abstract":"For prediction models developed on clustered data that do not account for cluster heterogeneity in model parameterization, it is crucial to use cluster-based validation to assess model generalizability on unseen clusters. This article introduces a clustered estimator of the network information criterion to approximate leave-one-cluster-out deviance for standard prediction models with twice-differentiable log-likelihood functions. The clustered network information criterion serves as a fast alternative to cluster-based cross-validation. Stone proved that the Akaike information criterion is asymptotically equivalent to leave-one-observation-out cross-validation for true parametric models with independent and identically distributed observations. Ripley noted that the network information criterion, derived from Stone's proof, is a better approximation when the model is misspecified. For clustered data, we derived clustered network information criterion by substituting the Fisher information matrix in the network information criterion with a clustering-adjusted estimator. The clustered network information criterion imposes a greater penalty when the data exhibits stronger clustering, thereby allowing the clustered network information criterion to better prevent over-parameterization. In a simulation study and an empirical example, we used standard regression to develop prediction models for clustered data with Gaussian or binomial responses. Compared to the commonly used Akaike information criterion and Bayesian information criterion for standard regression, clustered network information criterion provides a much more accurate approximation to leave-one-cluster-out deviance and results in more accurate model size and variable selection, as determined by cluster-based cross-validation, especially when the data exhibit strong clustering.","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1413-1430"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast leave-one-cluster-out cross-validation using clustered network information criterion.\",\"authors\":\"Jiaxing Qiu, Douglas E Lake, Pavel Chernyavskiy, Teague R Henry\",\"doi\":\"10.1177/09622802251345486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For prediction models developed on clustered data that do not account for cluster heterogeneity in model parameterization, it is crucial to use cluster-based validation to assess model generalizability on unseen clusters. This article introduces a clustered estimator of the network information criterion to approximate leave-one-cluster-out deviance for standard prediction models with twice-differentiable log-likelihood functions. The clustered network information criterion serves as a fast alternative to cluster-based cross-validation. Stone proved that the Akaike information criterion is asymptotically equivalent to leave-one-observation-out cross-validation for true parametric models with independent and identically distributed observations. Ripley noted that the network information criterion, derived from Stone's proof, is a better approximation when the model is misspecified. For clustered data, we derived clustered network information criterion by substituting the Fisher information matrix in the network information criterion with a clustering-adjusted estimator. The clustered network information criterion imposes a greater penalty when the data exhibits stronger clustering, thereby allowing the clustered network information criterion to better prevent over-parameterization. In a simulation study and an empirical example, we used standard regression to develop prediction models for clustered data with Gaussian or binomial responses. Compared to the commonly used Akaike information criterion and Bayesian information criterion for standard regression, clustered network information criterion provides a much more accurate approximation to leave-one-cluster-out deviance and results in more accurate model size and variable selection, as determined by cluster-based cross-validation, especially when the data exhibit strong clustering.\",\"PeriodicalId\":22038,\"journal\":{\"name\":\"Statistical Methods in Medical Research\",\"volume\":\" \",\"pages\":\"1413-1430\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Methods in Medical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/09622802251345486\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Methods in Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09622802251345486","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

对于在聚类数据上开发的预测模型，在模型参数化中不考虑聚类的异质性，使用基于聚类的验证来评估模型在未见聚类上的可泛化性至关重要。本文介绍了一种网络信息准则的聚类估计器，用于近似具有二次可微对数似然函数的标准预测模型的留一聚类偏差。聚类网络信息标准可作为基于聚类的交叉验证的快速替代方案。Stone证明了对于具有独立同分布观测值的真参数模型，Akaike信息准则渐近等价于留一个观测值的交叉验证。Ripley指出，从Stone的证明中衍生出来的网络信息标准，在模型被错误指定时是一个更好的近似值。对于聚类数据，用聚类调整估计量代替网络信息准则中的Fisher信息矩阵，得到聚类网络信息准则。当数据表现出更强的聚类时，聚类网络信息准则施加更大的惩罚，从而允许聚类网络信息准则更好地防止过度参数化。在模拟研究和实证示例中，我们使用标准回归开发具有高斯或二项响应的聚类数据的预测模型。与标准回归中常用的赤池信息准则和贝叶斯信息准则相比，聚类网络信息准则提供了更准确的近似留一个聚类偏差，并通过基于聚类的交叉验证确定了更准确的模型大小和变量选择，特别是当数据表现出强聚类时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast leave-one-cluster-out cross-validation using clustered network information criterion.

For prediction models developed on clustered data that do not account for cluster heterogeneity in model parameterization, it is crucial to use cluster-based validation to assess model generalizability on unseen clusters. This article introduces a clustered estimator of the network information criterion to approximate leave-one-cluster-out deviance for standard prediction models with twice-differentiable log-likelihood functions. The clustered network information criterion serves as a fast alternative to cluster-based cross-validation. Stone proved that the Akaike information criterion is asymptotically equivalent to leave-one-observation-out cross-validation for true parametric models with independent and identically distributed observations. Ripley noted that the network information criterion, derived from Stone's proof, is a better approximation when the model is misspecified. For clustered data, we derived clustered network information criterion by substituting the Fisher information matrix in the network information criterion with a clustering-adjusted estimator. The clustered network information criterion imposes a greater penalty when the data exhibits stronger clustering, thereby allowing the clustered network information criterion to better prevent over-parameterization. In a simulation study and an empirical example, we used standard regression to develop prediction models for clustered data with Gaussian or binomial responses. Compared to the commonly used Akaike information criterion and Bayesian information criterion for standard regression, clustered network information criterion provides a much more accurate approximation to leave-one-cluster-out deviance and results in more accurate model size and variable selection, as determined by cluster-based cross-validation, especially when the data exhibit strong clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Methods in Medical Research 医学-数学与计算生物学

CiteScore

4.10

自引率

4.30%

发文量

127

审稿时长

>12 weeks

期刊介绍： Statistical Methods in Medical Research is a peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and an essential reference for all medical statisticians. This unique journal is devoted solely to statistics and medicine and aims to keep professionals abreast of the many powerful statistical techniques now available to the medical profession. This journal is a member of the Committee on Publication Ethics (COPE)