Statistics in Medicine最新文献

筛选
英文 中文
Assessing the Performance of Machine Learning Methods Trained on Public Health Observational Data: A Case Study From COVID-19. 评估在公共卫生观察数据上训练的机器学习方法的性能:来自 COVID-19 的案例研究。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-11-10 Epub Date: 2024-09-05 DOI: 10.1002/sim.10211
Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J Roberts, Björn W Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb
{"title":"Assessing the Performance of Machine Learning Methods Trained on Public Health Observational Data: A Case Study From COVID-19.","authors":"Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J Roberts, Björn W Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb","doi":"10.1002/sim.10211","DOIUrl":"10.1002/sim.10211","url":null,"abstract":"<p><p>From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4861-4871"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired data. 包括独立数据和配对数据在内的部分聚类试验中连续结果的混合效应模型和广义估计方程的性能。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-11-10 Epub Date: 2024-09-04 DOI: 10.1002/sim.10201
Kylie M Lange, Thomas R Sullivan, Jessica Kasza, Lisa N Yelland
{"title":"Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired data.","authors":"Kylie M Lange, Thomas R Sullivan, Jessica Kasza, Lisa N Yelland","doi":"10.1002/sim.10201","DOIUrl":"10.1002/sim.10201","url":null,"abstract":"<p><p>Many clinical trials involve partially clustered data, where some observations belong to a cluster and others can be considered independent. For example, neonatal trials may include infants from single or multiple births. Sample size and analysis methods for these trials have received limited attention. A simulation study was conducted to (1) assess whether existing power formulas based on generalized estimating equations (GEEs) provide an adequate approximation to the power achieved by mixed effects models, and (2) compare the performance of mixed models vs GEEs in estimating the effect of treatment on a continuous outcome. We considered clusters that exist prior to randomization with a maximum cluster size of 2, three methods of randomizing the clustered observations, and simulated datasets with uninformative cluster size and the sample size required to achieve 80% power according to GEE-based formulas with an independence or exchangeable working correlation structure. The empirical power of the mixed model approach was close to the nominal level when sample size was calculated using the exchangeable GEE formula, but was often too high when the sample size was based on the independence GEE formula. The independence GEE always converged and performed well in all scenarios. Performance of the exchangeable GEE and mixed model was also acceptable under cluster randomization, though under-coverage and inflated type I error rates could occur with other methods of randomization. Analysis of partially clustered trials using GEEs with an independence working correlation structure may be preferred to avoid the limitations of mixed models and exchangeable GEEs.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4819-4835"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142133805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate maximum likelihood estimation in cure models using aggregated data, with application to HPV vaccine completion. 利用汇总数据对治愈模型进行近似最大似然估计,并应用于 HPV 疫苗接种完成情况。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-11-10 Epub Date: 2024-09-05 DOI: 10.1002/sim.10174
John D Rice, Allison Kempe
{"title":"Approximate maximum likelihood estimation in cure models using aggregated data, with application to HPV vaccine completion.","authors":"John D Rice, Allison Kempe","doi":"10.1002/sim.10174","DOIUrl":"10.1002/sim.10174","url":null,"abstract":"<p><p>Research into vaccine hesitancy is a critical component of the public health enterprise, as rates of communicable diseases preventable by routine childhood immunization have been increasing in recent years. It is therefore important to estimate proportions of \"never-vaccinators\" in various subgroups of the population in order to successfully target interventions to improve childhood vaccination rates. However, due to privacy issues, it may be difficult to obtain individual patient data (IPD) needed to perform the appropriate time-to-event analyses: state-level immunization information services may only be willing to share aggregated data with researchers. We propose statistical methodology for the analysis of aggregated survival data that can accommodate a cured fraction based on a polynomial approximation of the mixture cure model log-likelihood function relying only on summary statistics. We study the performance of the method through simulation studies and apply it to a real-world data set from a study examining reminder/recall approaches to improve human papillomavirus (HPV) vaccination uptake. The proposed methods may be generalized for use when there is interest in fitting complex likelihood-based models but IPD is unavailable due to data privacy or other concerns.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4872-4886"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142133804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Causal Mediation Approach to Account for Interaction of Treatment and Intercurrent Events: Using Hypothetical Strategy. 解释治疗与并发症相互作用的因果中介方法:使用假设策略。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-11-10 Epub Date: 2024-09-05 DOI: 10.1002/sim.10212
Kunpeng Wu, Xiangliang Zhang, Meng Zheng, Jianghui Zhang, Wen Chen
{"title":"A Causal Mediation Approach to Account for Interaction of Treatment and Intercurrent Events: Using Hypothetical Strategy.","authors":"Kunpeng Wu, Xiangliang Zhang, Meng Zheng, Jianghui Zhang, Wen Chen","doi":"10.1002/sim.10212","DOIUrl":"10.1002/sim.10212","url":null,"abstract":"<p><p>Hypothetical strategy is a common strategy for handling intercurrent events (IEs). No current guideline or study considers treatment-IE interaction to target the estimand in any one IE-handling strategy. Based on the hypothetical strategy, we aimed to (1) assess the performance of three estimators with different considerations for the treatment-IE interaction in a simulation and (2) compare the estimation of these estimators in a real trial. Simulation data were generalized based on realistic clinical trials of Alzheimer's disease. The estimand of interest was the effect of treatment with no IE occurring under the hypothetical strategy. Three estimators, namely, G-estimation with and without interaction and IE-ignored estimation, were compared in scenarios where the treatment-IE interaction effect was set as -50% to 50% of the main effect. Bias was the key performance measure. The real case was derived from a randomized trial of methadone maintenance treatment. Only G-estimation with interaction exhibited unbiased estimations regardless of the existence, direction or magnitude of the treatment-IE interaction in those scenarios. Neglecting the interaction and ignoring the IE would introduce a bias as large as 0.093 and 0.241 (true value, -1.561) if the interaction effect existed. In the real case, compared with G-estimation with interaction, G-estimation without interaction and IE-ignored estimation increased the estimand of interest by 33.55% and 34.36%, respectively. This study highlights the importance of considering treatment-IE interaction in the estimand framework. In practice, it would be better to include the interaction in the estimator by default.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4850-4860"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid approach to sample size re-estimation in cluster randomized trials with continuous outcomes. 在具有连续结果的分组随机试验中重新估计样本量的混合方法。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-08-28 DOI: 10.1002/sim.10205
Samuel K Sarkodie, James Ms Wason, Michael J Grayling
{"title":"A hybrid approach to sample size re-estimation in cluster randomized trials with continuous outcomes.","authors":"Samuel K Sarkodie, James Ms Wason, Michael J Grayling","doi":"10.1002/sim.10205","DOIUrl":"10.1002/sim.10205","url":null,"abstract":"<p><p>This study presents a hybrid (Bayesian-frequentist) approach to sample size re-estimation (SSRE) for cluster randomised trials with continuous outcome data, allowing for uncertainty in the intra-cluster correlation (ICC). In the hybrid framework, pre-trial knowledge about the ICC is captured by placing a Truncated Normal prior on it, which is then updated at an interim analysis using the study data, and used in expected power control. On average, both the hybrid and frequentist approaches mitigate against the implications of misspecifying the ICC at the trial's design stage. In addition, both frameworks lead to SSRE designs with approximate control of the type I error-rate at the desired level. It is clearly demonstrated how the hybrid approach is able to reduce the high variability in the re-estimated sample size observed within the frequentist framework, based on the informativeness of the prior. However, misspecification of a highly informative prior can cause significant power loss. In conclusion, a hybrid approach could offer advantages to cluster randomised trials using SSRE. Specifically, when there is available data or expert opinion to help guide the choice of prior for the ICC, the hybrid approach can reduce the variance of the re-estimated required sample size compared to a frequentist approach. As SSRE is unlikely to be employed when there is substantial amounts of such data available (ie, when a constructed prior is highly informative), the greatest utility of a hybrid approach to SSRE likely lies when there is low-quality evidence available to guide the choice of prior.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4736-4751"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142081539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simulation study of the performance of statistical models for count outcomes with excessive zeros. 统计模型对零点过多的计数结果的性能模拟研究。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-08-28 DOI: 10.1002/sim.10198
Zhengyang Zhou, Dateng Li, David Huh, Minge Xie, Eun-Young Mun
{"title":"A simulation study of the performance of statistical models for count outcomes with excessive zeros.","authors":"Zhengyang Zhou, Dateng Li, David Huh, Minge Xie, Eun-Young Mun","doi":"10.1002/sim.10198","DOIUrl":"10.1002/sim.10198","url":null,"abstract":"<p><strong>Background: </strong>Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data.</p><p><strong>Methods: </strong>The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( <math> <semantics><mrow><mi>N</mi> <mo>=</mo> <mn>100</mn></mrow> <annotation>$$ N=100 $$</annotation></semantics> </math> to 500), zero rate (0.2 to 0.8), and intervention effect sizes.</p><p><strong>Results: </strong>Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory.</p><p><strong>Conclusions: </strong>The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4752-4767"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142081540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal quantile treatment effect estimation using principal scores. 利用主分数估算主量子治疗效果。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-08-19 DOI: 10.1002/sim.10178
Kotaro Mizuma, Takamasa Hashimoto, Sho Sakui, Shingo Kuroda
{"title":"Principal quantile treatment effect estimation using principal scores.","authors":"Kotaro Mizuma, Takamasa Hashimoto, Sho Sakui, Shingo Kuroda","doi":"10.1002/sim.10178","DOIUrl":"10.1002/sim.10178","url":null,"abstract":"<p><p>Intercurrent events and estimands play a key role in defining the treatment effects of interest precisely. Sometimes the median or other quantiles of outcomes in a principal stratum according to potential occurrence of intercurrent events are of interest in randomized clinical trials. Naïve analyses such as those based on the observed occurrence of the intercurrent events lead to biased results. Therefore, we propose principal quantile treatment effect estimators that can nonparametrically estimate the distribution of potential outcomes by principal score weighting without relying on the exclusion restriction assumption. Our simulation studies show that the proposed method works in situations where the median or quantiles may be regarded as the preferred population-level summary over the mean. We illustrate our proposed method by using data from a randomized controlled trial conducted on patients with nonerosive reflux disease.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4635-4649"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142000647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection and Correction in Dense Functional Data Within Electronic Medical Records. 电子病历中密集功能数据的异常检测与校正。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-09-03 DOI: 10.1002/sim.10209
Daren Kuwaye, Hyunkeun Ryan Cho
{"title":"Anomaly Detection and Correction in Dense Functional Data Within Electronic Medical Records.","authors":"Daren Kuwaye, Hyunkeun Ryan Cho","doi":"10.1002/sim.10209","DOIUrl":"10.1002/sim.10209","url":null,"abstract":"<p><p>In medical research, the accuracy of data from electronic medical records (EMRs) is critical, particularly when analyzing dense functional data, where anomalies can severely compromise research integrity. Anomalies in EMRs often arise from human errors in data measurement and entry, and increase in frequency with the volume of data. Despite the established methods in computer science, anomaly detection in medical applications remains underdeveloped. We address this deficiency by introducing a novel tool for identifying and correcting anomalies specifically in dense functional EMR data. Our approach utilizes studentized residuals from a mean-shift model, and therefore assumes that the data adheres to a smooth functional trajectory. Additionally, our method is tailored to be conservative, focusing on anomalies that signify actual errors in the data collection process while controlling for false discovery rates and type II errors. To support widespread implementation, we provide a comprehensive R package, ensuring that our methods can be applied in diverse settings. Our methodology's efficacy has been validated through rigorous simulation studies and real-world applications, confirming its ability to accurately identify and correct errors, thus enhancing the reliability and quality of medical data analysis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4768-4777"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142120560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling multiple-criterion diagnoses by heterogeneous-instance logistic regression. 通过异质事例逻辑回归对多重标准诊断建模。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-08-27 DOI: 10.1002/sim.10202
Chun-Hao Yang, Ming-Han Li, Shu-Fang Wen, Sheng-Mao Chang
{"title":"Modeling multiple-criterion diagnoses by heterogeneous-instance logistic regression.","authors":"Chun-Hao Yang, Ming-Han Li, Shu-Fang Wen, Sheng-Mao Chang","doi":"10.1002/sim.10202","DOIUrl":"10.1002/sim.10202","url":null,"abstract":"<p><p>Mild cognitive impairment (MCI) is a prodromal stage of Alzheimer's disease (AD) that causes a significant burden in caregiving and medical costs. Clinically, the diagnosis of MCI is determined by the impairment statuses of five cognitive domains. If one of these cognitive domains is impaired, the patient is diagnosed with MCI, and if two out of the five domains are impaired, the patient is diagnosed with AD. In medical records, most of the time, the diagnosis of MCI/AD is given, but not the statuses of the five domains. We may treat the domain statuses as missing variables. This diagnostic procedure relates MCI/AD status modeling to multiple-instance learning, where each domain resembles an instance. However, traditional multiple-instance learning assumes common predictors among instances, but in our case, each domain is associated with different predictors. In this article, we generalized the multiple-instance logistic regression to accommodate the heterogeneity in predictors among different instances. The proposed model is dubbed heterogeneous-instance logistic regression and is estimated via the expectation-maximization algorithm because of the presence of the missing variables. We also derived two variants of the proposed model for the MCI and AD diagnoses. The proposed model is validated in terms of its estimation accuracy, latent status prediction, and robustness via extensive simulation studies. Finally, we analyzed the National Alzheimer's Coordinating Center-Uniform Data Set using the proposed model and demonstrated its potential.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4684-4701"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142073903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach. 使用基于秩的方法,为具有海量生存数据的半参数加速失效时间模型优化子采样。
IF 1.8 4区 医学
Statistics in Medicine Pub Date : 2024-10-30 Epub Date: 2024-08-20 DOI: 10.1002/sim.10200
Zehan Yang, HaiYing Wang, Jun Yan
{"title":"Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach.","authors":"Zehan Yang, HaiYing Wang, Jun Yan","doi":"10.1002/sim.10200","DOIUrl":"10.1002/sim.10200","url":null,"abstract":"<p><p>Subsampling is a practical strategy for analyzing vast survival data, which are progressively encountered across diverse research domains. While the optimal subsampling method has been applied to inferences for Cox models and parametric accelerated failure time (AFT) models, its application to semi-parametric AFT models with rank-based estimation have received limited attention. The challenges arise from the non-smooth estimating function for regression coefficients and the seemingly zero contribution from censored observations in estimating functions in the commonly seen form. To address these challenges, we develop optimal subsampling probabilities for both event and censored observations by expressing the estimating functions through a well-defined stochastic process. Meanwhile, we apply an induced smoothing procedure to the non-smooth estimating functions. As the optimal subsampling probabilities depend on the unknown regression coefficients, we employ a two-step procedure to obtain a feasible estimation method. An additional benefit of the method is its ability to resolve the issue of underestimation of the variance when the subsample size approaches the full sample size. We validate the performance of our estimators through a simulation study and apply the methods to analyze the survival time of lymphoma patients in the surveillance, epidemiology, and end results program.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4650-4666"},"PeriodicalIF":1.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信