{"title":"An optimal subsampling design for large-scale Cox model with censored data.","authors":"Shiqi Liu, Zilong Xie, Ming Zheng, Wen Yu","doi":"10.1080/02664763.2024.2423234","DOIUrl":null,"url":null,"abstract":"<p><p>Subsampling designs are useful for reducing computational load and storage cost for large-scale data analysis. For massive survival data with right censoring, we propose a class of optimal subsampling designs under the widely-used Cox model. The proposed designs utilize information from both the outcome and the covariates. Different forms of the design can be derived adaptively to meet various targets, such as optimizing the overall estimation accuracy or minimizing the variation of specific linear combination of the estimators. Given the subsampled data, the inverse probability weighting approach is employed to estimate the model parameters. The resultant estimators are shown to be consistent and asymptotically normally distributed. Simulation results indicate that the proposed subsampling design yields more efficient estimators than the uniform subsampling by using subsampled data of comparable sample sizes. Additionally, the subsampling estimation significantly reduces the computational load and storage cost relative to the full data estimation. An analysis of a real data example is provided for illustration.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 7","pages":"1315-1341"},"PeriodicalIF":1.1000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123965/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/02664763.2024.2423234","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Subsampling designs are useful for reducing computational load and storage cost for large-scale data analysis. For massive survival data with right censoring, we propose a class of optimal subsampling designs under the widely-used Cox model. The proposed designs utilize information from both the outcome and the covariates. Different forms of the design can be derived adaptively to meet various targets, such as optimizing the overall estimation accuracy or minimizing the variation of specific linear combination of the estimators. Given the subsampled data, the inverse probability weighting approach is employed to estimate the model parameters. The resultant estimators are shown to be consistent and asymptotically normally distributed. Simulation results indicate that the proposed subsampling design yields more efficient estimators than the uniform subsampling by using subsampled data of comparable sample sizes. Additionally, the subsampling estimation significantly reduces the computational load and storage cost relative to the full data estimation. An analysis of a real data example is provided for illustration.
期刊介绍:
Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.