Tanayott Thaweethai, Rajarshi Mukherjee, David Arterburn, Heidi Fischer, Catherine Lee, Susan M Shortreed, Sebastien Haneuse
{"title":"Adjusting for Selection Bias Due to Missing Data in Electronic Health Records-Based Research by Blending Multiple Imputation and Inverse Probability Weighting.","authors":"Tanayott Thaweethai, Rajarshi Mukherjee, David Arterburn, Heidi Fischer, Catherine Lee, Susan M Shortreed, Sebastien Haneuse","doi":"10.1002/sim.70151","DOIUrl":"https://doi.org/10.1002/sim.70151","url":null,"abstract":"<p><p>Due to the complex process by which electronic health records (EHR) are generated and collected, missing data is a significant challenge when conducting large observational studies using such data. However, most standard methods that seek to adjust for the potential selection bias induced by restricting to individuals with complete data fail to address the heterogeneous structure of EHR. To address this, a framework was previously proposed that modularizes the data provenance that gives rise to the observed data as a sequence of decisions made by patients, healthcare providers, and the health system. In this work, we formalize analyses with this framework, specifically by proposing a pragmatic, flexible, and scalable framework for estimation and inference that blends inverse probability weighting and multiple imputation. The proposed framework allows better alignment between consideration of missingness assumptions to the complexity of EHR data. In addition to formal theoretical justification and simulation studies, we illustrate the proposed framework with a motivating data application in which EHR data are used to investigate weight loss outcomes following bariatric surgery, and whether differences between two surgery types exhibit effect modification by presence/absence of chronic kidney disease.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70151"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Integrated and Coherent Framework for Point Estimation and Hypothesis Testing With Concurrent Controls in Platform Trials.","authors":"Tianyu Zhan, Jane Zhang, Lei Shu, Yihua Gu","doi":"10.1002/sim.70196","DOIUrl":"https://doi.org/10.1002/sim.70196","url":null,"abstract":"<p><p>A platform trial with a master protocol provides an infrastructure to ethically and efficiently evaluate multiple treatment options in multiple diseases. Given that certain study drugs can enter or exit a platform trial, the randomization ratio is possible to change over time, and this potential modification is not necessarily dependent on accumulating outcomes data. It is recommended that the analysis should account for time periods with different randomization ratios, with possible approaches such as inverse probability of treatment weighting or a weighted approach by the time period. To guide practical implementation, we specifically investigate the relationship between these two estimators, and further derive an optimal estimator within this class to gain efficacy. Practical guidance is provided on how to construct estimators based on the observed data to approximate this unknown weight. The connection between the proposed method and the weighted least squares is also studied. We conduct simulation studies to demonstrate that the proposed method can control type I error rate with a reduced estimation bias, and can also achieve satisfactory power and mean squared error with computational efficiency. Another appealing feature of our framework is the ability to provide consistent conclusions for both point estimation and hypothesis testing. This is critical to the interpretation of clinical trial results. The proposed method is further applied to the Accelerating COVID-19 Therapeutic Interventions and Vaccines platform trial.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70196"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144675697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous Feature Selection for Optimal Dynamic Treatment Regimens.","authors":"Mochuan Liu, Yuanjia Wang, Donglin Zeng","doi":"10.1002/sim.70169","DOIUrl":"10.1002/sim.70169","url":null,"abstract":"<p><p>Dynamic treatment regimens (DTRs), where treatment decisions are tailored to individual patient's characteristics and evolving health status over multiple stages, have gained increasing interest in the modern era of precision medicine. Identifying important features that drive these decisions over stages not only leads to parsimonious DTRs for practical use but also enhances the reliability of learning optimal DTRs. Existing methods for learning optimal DTRs, such as Q-learning and O-learning, rely on a sequential procedure to estimate the optimal decision at each stage backward. Incorporating feature selection in these methods through regularization at each stage of estimation only identifies unimportant tailoring variables at each stage but is not necessary for those variables that are not important across all the stages. As a result, false discovery errors are likely to accumulate over stages in these sequential methods. To overcome this limitation, we propose a framework, namely L1 multistage ramp loss (L1-MRL) learning, to learn the optimal decision rules and, at the same time, perform variable selection across all the stages simultaneously. This framework uses a single multistage ramp loss to estimate optimal DTRs for all stages. Furthermore, a group Lasso-type penalty is imposed to penalize the coefficients in the decision rules across all stages, which enables the identification of features that are important for at least one stage decision. Theoretically, we show that the estimator is consistent and enjoys the oracle property toward the optimal. We demonstrate that the proposed method performs equally well as or better than many existing DTR methods with variable selection capability via extensive simulation studies and an application to electronic health record (EHR) data for type 2 diabetes (T2D) patients.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70169"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edouard F Bonneville, Jan Beyersmann, Ruth H Keogh, Jonathan W Bartlett, Tim P Morris, Nicola Polverelli, Liesbeth C de Wreede, Hein Putter
{"title":"Multiple Imputation of Missing Covariates When Using the Fine-Gray Model.","authors":"Edouard F Bonneville, Jan Beyersmann, Ruth H Keogh, Jonathan W Bartlett, Tim P Morris, Nicola Polverelli, Liesbeth C de Wreede, Hein Putter","doi":"10.1002/sim.70166","DOIUrl":"10.1002/sim.70166","url":null,"abstract":"<p><p>The Fine-Gray model for the subdistribution hazard is commonly used for estimating associations between covariates and competing risks outcomes. When there are missing values in the covariates included in a given model, researchers may wish to multiply impute them. Assuming interest lies in estimating the risk of only one of the competing events, this paper develops a substantive-model-compatible multiple imputation approach that exploits the parallels between the Fine-Gray model and the standard (single-event) Cox model. In the presence of right-censoring, this involves first imputing the potential censoring times for those failing from competing events, and thereafter imputing the missing covariates by leveraging methodology previously developed for the Cox model in the setting without competing risks. In a simulation study, we compared the proposed approach to alternative methods, such as imputing compatibly with cause-specific Cox models. The proposed method performed well (in terms of estimation of both subdistribution log hazard ratios and cumulative incidences) when data were generated assuming proportional subdistribution hazards, and performed satisfactorily when this assumption was not satisfied. The gain in efficiency compared to a complete-case analysis was demonstrated in both the simulation study and in an applied data example on competing outcomes following an allogeneic stem cell transplantation. For individual-specific cumulative incidence estimation, assuming proportionality on the correct scale at the analysis phase appears to be more important than correctly specifying the imputation procedure used to impute the missing covariates.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70166"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144708813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Cui, Bo Huang, Gaohong Dong, Ryuji Uozumi, Lu Tian
{"title":"An IPCW Adjusted Win Statistics Approach in Clinical Trials Incorporating Equivalence Margins to Define Ties.","authors":"Ying Cui, Bo Huang, Gaohong Dong, Ryuji Uozumi, Lu Tian","doi":"10.1002/sim.70180","DOIUrl":"https://doi.org/10.1002/sim.70180","url":null,"abstract":"<p><p>In clinical trials, multiple outcomes of different priorities commonly occur as the patient's response may not be adequately characterized by a single outcome. Win statistics are appealing summary measures for between-group difference at more than one endpoint. When defining the result of pairwise comparisons of a time-to-event endpoint, it is desirable to allow ties to account for incomplete follow-up and not clinically meaningful difference in endpoints of interest. In this article, we propose a class of win statistics for time-to-event endpoints with a user-specified equivalence margin. These win statistics are identifiable in the presence of right censoring and do not depend on the censoring distribution. We then develop estimation and inference procedures for the proposed win statistics based on inverse-probability-of-censoring weighting adjustment to handle right censoring. We conduct extensive simulations to investigate the operational characteristics of the proposed procedure in the finite sample setting. A real oncology trial is used to illustrate the proposed approach.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70180"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo Amorim, Ran Tao, Thomas Lumley, Pamela A Shaw, Bryan E Shepherd
{"title":"Ascertainment Conditional Maximum Likelihood for Continuous Outcome Under Two-Phase Response-Selective Design.","authors":"Gustavo Amorim, Ran Tao, Thomas Lumley, Pamela A Shaw, Bryan E Shepherd","doi":"10.1002/sim.70111","DOIUrl":"10.1002/sim.70111","url":null,"abstract":"<p><p>Data collection procedures are often time-consuming and expensive. An alternative to collecting full information from all subjects enrolled in a study is a two-phase design: Variables that are inexpensive or easy to measure are obtained for the study population, and more specific, expensive, or hard-to-measure variables are collected only for a well-selected sample of individuals. Often, only these subjects that provided full information are used for inference, while those that were partially observed are discarded from the analysis. Recently, semiparametric approaches that use the entire dataset, resulting in fully efficient estimators, have been proposed. These estimators, however, have challenges incorporating multiple covariates, are computationally expensive, and depend on tuning parameters that affect their performance. In this paper, we propose an alternative semiparametric estimator that does not pose any distributional assumptions on the covariates or measurement error mechanism and can be applied to a wider range of settings. Although the proposed estimator is not semiparametric efficient, simulations show that the loss of efficiency to estimate the parameters associated with the partially observed covariates is minimal. We highlight the estimator's applicability to real-world problems, where data structures are complex and rich, and complicated regression models are often necessary.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70111"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12258418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahui Feng, Haolun Shi, Ma Da, Mirza Faisal Beg, Jiguo Cao
{"title":"A Latent-Class Model for Time-To-Event Outcomes and High-Dimensional Imaging Data.","authors":"Jiahui Feng, Haolun Shi, Ma Da, Mirza Faisal Beg, Jiguo Cao","doi":"10.1002/sim.70186","DOIUrl":"10.1002/sim.70186","url":null,"abstract":"<p><p>Structural magnetic resonance imaging (MRI) is one of the primary predictors of Alzheimer's disease risk, enabling the identification of patients with similar risk profiles for precision medicine treatment. Motivated by the need for flexible modeling in AD research, we propose a latent-class model that addresses the heterogeneity within study populations. This model allows for varying relationships between covariates and survival outcomes, accommodating the dynamics of AD progression. The imaging predictors are characterized by bivariate splines over triangulation to accommodate the irregular domain of the brain images. We develop a generalized expectation-maximization (EM) algorithm that combines the computational methods for logistic regression and penalized proportional hazards models to implement the proposed approach. We demonstrate the advantages of the proposed method through extensive simulation studies and provide an application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, which helps to reveal different subtypes or stages of the disease process in Alzheimer's Disease.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70186"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengyuan Lu, Hein Putter, Mar Rodríguez Girondo, Jelle J Goeman
{"title":"Model Validation for Survival Analysis by Smoothed Predictive Likelihood.","authors":"Chengyuan Lu, Hein Putter, Mar Rodríguez Girondo, Jelle J Goeman","doi":"10.1002/sim.70193","DOIUrl":"10.1002/sim.70193","url":null,"abstract":"<p><p>Assessing the predictive performance is a crucial aspect in survival modeling, essential for model selection, tuning parameter determination, and evaluating additional predictive ability. The predictive log-likelihood has been recommended as a suitable evaluation measure, particularly for survival models, which generally return entire survival curves rather than point predictions. However, applying predictive likelihood in semiparametric and nonparametric survival models is problematic since the survival curves are step-functions, which result in zero predictive likelihood when events occur at previously unobserved time points. The most well-known existing solution, Verweij's predictive partial likelihood, is limited to Cox models. In this article, we propose a novel approach based on nearest-neighbor kernel smoothing that is usable in general semi- and nonparametric survival models. We show that our new method performs competitively with existing methods in the Cox setting while offering broader applicability, including testing for the presence of a frailty term and determining the optimal level of smoothness in penalized additive hazards models.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70193"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12274099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separable Effects of Semicompeting Risks: The Effects of Hepatitis B on Liver Cancer via Liver Cirrhosis.","authors":"Jih-Chang Yu, Yen-Tsung Huang","doi":"10.1002/sim.70178","DOIUrl":"https://doi.org/10.1002/sim.70178","url":null,"abstract":"<p><p>We are interested in how patients with hepatitis B progress to liver cirrhosis (an intermediate outcome) and liver cancer (a primary outcome). The separable effect has recently been proposed to study causal effects in the setting of competing risks. In this work, we extend the separable effect approach to semicompeting risks involving a primary and intermediate outcome. We decompose the exposure to hepatitis B virus into two disjoint components: the first component affects liver cancer directly, that is, direct effect, and the other affects liver cancer through liver cirrhosis, that is, indirect effect. Under such an effect separation, the identification formula of counterfactual risk for liver cancer that we derive for semicompeting risks is a function of cause-specific hazards and transition hazards of multistate models. It can be reduced to the formula for competing risks as a special case. We propose nonparametric and semiparametric methods to estimate the causal effects and study their asymptotic properties. The model-free nonparametric method is robust but less efficient for confounder adjustment; the model-based semiparametric method flexibly accommodates confounders by treating them as covariates. We conduct comprehensive simulations to study the performance of the proposed methods. Our data analyses of the hepatitis study show that there exist both direct and indirect effects of hepatitis B infection on the incidence of liver cancer through liver cirrhosis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70178"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144675700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:semantics><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> <ns0:annotation>$$ alpha $$</ns0:annotation></ns0:semantics> </ns0:math> -KIDS: A Novel Feature Evaluation in the Ultrahigh-Dimensional Right-Censored Setting, With Application to Head and Neck Cancer.","authors":"Atika Farzana Urmi, Chenlu Ke, Dipankar Bandyopadhyay","doi":"10.1002/sim.70167","DOIUrl":"10.1002/sim.70167","url":null,"abstract":"<p><p>Recent advances in sequencing technologies have allowed the collection of massive genome-wide information that substantially enhances the diagnosis and prognosis of head and neck cancer. Identifying predictive markers for survival time is crucial for devising prognostic systems and learning the underlying molecular drivers of the cancer course. In this paper, we introduce <math> <semantics><mrow><mi>α</mi></mrow> <annotation>$$ alpha $$</annotation></semantics> </math> -KIDS, a model-free feature screening procedure with false discovery rate (FDR) control for ultrahigh-dimensional right-censored data, which is robust against unknown censoring mechanisms. Specifically, our two-stage procedure initially selects a set of important features with a dual screening mechanism using nonparametric reproducing-kernel-based ANOVA statistics, followed by identifying a refined set of features under FDR control through a unified knockoff procedure. The finite sample properties of our method and its novelty (in light of existing alternatives) are evaluated via simulation studies. Furthermore, we illustrate our methodology via application to a motivating right-censored head and neck (HN) cancer survival data derived from The Cancer Genome Atlas, with further validation on a similar HN cancer data from the Gene Expression Omnibus database. The methodology can be implemented using the R package aKIDS, which is available on GitHub.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 15-17","pages":"e70167"},"PeriodicalIF":1.8,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261880/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}