Marinela Capanu, Mihai Giurcanu, Colin B Begg, Mithat Gönen
{"title":"Two-stage subsampling variable selection for sparse high-dimensional generalized linear models.","authors":"Marinela Capanu, Mihai Giurcanu, Colin B Begg, Mithat Gönen","doi":"10.1177/09622802251343597","DOIUrl":"https://doi.org/10.1177/09622802251343597","url":null,"abstract":"<p><p>Although high-dimensional data analysis has received a lot of attention after the advent of omics data, model selection in this setting continues to be challenging and there is still substantial room for improvement. Through a novel combination of existing methods, we propose here a two-stage subsampling approach for variable selection in high-dimensional generalized linear regression models. In the first stage, we screen the variables using smoothly clipped absolute deviance penalty regularization followed by partial least squares regression on repeated subsamples of the data; we include in the second stage only those predictors that were most frequently selected over the subsamples either by smoothly clipped absolute deviance or for having the top loadings in either of the first two partial least squares regression components. In the second stage, we again repeatedly subsample the data and, for each subsample, we find the best Akaike information criterion model based on an exhaustive search of all possible models on the reduced set of predictors. We then include in the final model those predictors with high selection probability across the subsamples. We prove that the proposed first-stage estimator is <math><msup><mi>n</mi><mrow><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></math>-consistent and that the true predictors are included in the first stage with probability converging to 1. In an extensive simulation study, we show that this two-stage approach outperforms the competitors yielding among the highest probability of selecting the true model while having one of the lowest number of false positives in the settings of logistic, Poisson, and linear regression. We illustrate the proposed method on two gene expression cancer datasets.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251343597"},"PeriodicalIF":1.6,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144544953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group sequential analysis of marked point processes: Plasma donation trials.","authors":"Kecheng Li, Richard J Cook","doi":"10.1177/09622802251350263","DOIUrl":"https://doi.org/10.1177/09622802251350263","url":null,"abstract":"<p><p>Plasma donation plays a critical role in modern medicine by providing lifesaving treatments for patients with a wide range of conditions like bleeding disorders, immune deficiencies, and infections. Evaluation of devices used to collect blood plasma from donors is essential to ensure donor safety. We consider the design of plasma donation trials when the goal is to assess the safety of a new device on the response to transfusions compared to the standard device. A unique feature is that the number of donations per donor varies substantially so some individuals contribute more information and others less. The sample size formula is derived to ensure power requirements are met when analyses are based on generalized estimating equations and robust variance estimation. Strategies for interim monitoring based on group sequential designs using alpha spending functions are developed based on a robust covariance matrix for estimates of treatment effect over successive analyses. The design of a plasma donation study is illustrated where the focus is on assessing the safety of a new device with serious hypotensive adverse events as the primary outcome.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251350263"},"PeriodicalIF":1.6,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144544952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xijin Chen, Pavel Mozgunov, Richard D Baird, Thomas Jaki
{"title":"Using circulating tumor DNA as a novel biomarker of efficacy for dose-finding designs in oncology.","authors":"Xijin Chen, Pavel Mozgunov, Richard D Baird, Thomas Jaki","doi":"10.1177/09622802251350457","DOIUrl":"https://doi.org/10.1177/09622802251350457","url":null,"abstract":"<p><p>Dose-finding trials are designed to identify a safe and potentially effective drug dose and schedule during the early phase of clinical trials. Historically, Bayesian adaptive dose-escalation methods in Phase I trials in cancer have mainly focussed on toxicity endpoints rather than efficacy endpoints. This is partly because efficacy readouts are often not available soon enough for dose escalation decisions. In the last decade, 'liquid biopsy' technologies have been developed, which may provide a readout of treatment response much earlier than conventional endpoints. This paper develops a novel design that uses a biomarker, circulating tumour DNA (ctDNA), with toxicity and activity outcomes in dose-finding studies. We compare the proposed approach based on repeated ctDNA measurement with existing Bayesian adaptive approaches under various scenarios of dose-toxicity, dose-efficacy relationship, and trajectories of regular ctDNA values over time. Simulation results show that the proposed approach can yield significantly shorter trial duration and may improve identification of the target dose. In addition, this approach has the potential to minimise the time individual patients spend on potentially inactive trial therapies. Using two different dose-finding designs, we demonstrate that the way we incorporate biomarker information is broadly applicable across different dose-finding designs and yields notable benefit in both cases.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251350457"},"PeriodicalIF":1.6,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144529545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas T Williams, Richard Liu, Katherine L Hoffman, Sarah Forrest, Kara E Rudolph, Iván Díaz
{"title":"Two-stage targeted minimum-loss based estimation for non-negative two-part outcomes.","authors":"Nicholas T Williams, Richard Liu, Katherine L Hoffman, Sarah Forrest, Kara E Rudolph, Iván Díaz","doi":"10.1177/09622802251340245","DOIUrl":"https://doi.org/10.1177/09622802251340245","url":null,"abstract":"<p><p>Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-stage targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions, which can accommodate continuous, categorical, and binary exposures. The two-stage TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-stage TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days' supply of opioids.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251340245"},"PeriodicalIF":1.6,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144498076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategies to boost statistical efficiency in randomized oncology trials with primary time-to-event endpoints.","authors":"Alan D Hutson, Han Yu","doi":"10.1177/09622802251343599","DOIUrl":"https://doi.org/10.1177/09622802251343599","url":null,"abstract":"<p><p>Oncology clinical trials are increasingly expensive, necessitating efforts to streamline phase II and III trials to reduce costs and expedite treatment delivery. Randomization is often impractical in oncology trials due to small sample sizes and limited statistical power, leading to biased inferences. The FDA has recently published guidance documents encouraging the use of prognostic baseline measures to improve the precision of inferences around treatment effects. To address this, we propose an extension of Rosenbaum's exact testing method incorporating a variant of martingale residuals for right censored data. This method can dramatically improve the statistical power of the test comparing treatment arms given time-to-event endpoints as compared to the standard log-rank test. Additionally, the modification of the martingale residual provides a straightforward metric for summarizing treatment effect by quantifying the expected events per treatment arm at each time-point. This approach is illustrated using a phase II clinical trial in small cell lung cancer.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251343599"},"PeriodicalIF":1.6,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144476715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple imputation for systematically missing effect modifiers in individual participant data meta-analysis.","authors":"Robert Thiesmeier, Scott M Hofer, Nicola Orsini","doi":"10.1177/09622802251348800","DOIUrl":"10.1177/09622802251348800","url":null,"abstract":"<p><p>Individual participant data (IPD) meta-analysis of randomised trials is a crucial method for detecting and investigating effect modifications in medical research. However, few studies have explored scenarios involving systematically missing data on discrete effect modifiers (EMs) in IPD meta-analyses with a limited number of trials. This simulation study examines the impact of systematic missing values in IPD meta-analysis using a two-stage imputation method. We simulated IPD meta-analyses of randomised trials with multiple studies that had systematically missing data on the EM. A multivariable Weibull survival model was specified to assess beneficial (Hazard Ratio (HR)<math><mo>=</mo></math>0.8), null (HR<math><mo>=</mo></math>1.0), and harmful (HR<math><mo>=</mo></math>1.2) treatment effects for low, medium, and high levels of an EM, respectively. Bias and coverage were evaluated using Monte-Carlo simulations. The absolute bias for common and heterogeneous effect IPD meta-analyses was less than 0.016 and 0.007, respectively, with coverage close to its nominal value across all EM levels. An uncongenial imputation model resulted in larger bias, even when the proportion of studies with systematically missing data on the EM was small. Overall, the proposed two-stage imputation approach provided unbiased estimates with improved precision. The assumptions and limitations of this approach are discussed.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251348800"},"PeriodicalIF":1.6,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144333871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mitchell Aaron Schepps, Jérémy Seurat, France Mentré, Weng Kee Wong
{"title":"Design optimization of longitudinal studies using metaheuristics: Application to lithium pharmacokinetics.","authors":"Mitchell Aaron Schepps, Jérémy Seurat, France Mentré, Weng Kee Wong","doi":"10.1177/09622802251350262","DOIUrl":"https://doi.org/10.1177/09622802251350262","url":null,"abstract":"<p><p>Lithium is recommended as a first line treatment for patients with bipolar disorder. However, only certain patients show a good response to the drug, and the variability and tolerability of lithium response are poorly understood. Greater precision in the early identification of individuals who are likely to respond well to lithium is a significant unmet clinical need. We create optimal designs to better understand the pharmacokinetic exposition of lithium for patients with and without a genetic covariate. From a Fisher information matrix based method, we find different optimal designs for estimating various parameters in a complicated pharmacokinetics/pharmacodynamics nonlinear mixed effects model with multiple physician specified constraints. Our approach uses flexible state-of-the-art metaheuristics to find various types of efficient designs, including multiple-objective optimal designs that can balance the competitiveness of the objectives and deliver higher efficiencies for more important objectives. Results from this article will be used as part of a broader study to implement efficient designs to better understand the exposition of sustained-release lithium in patients with bipolar disorder.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251350262"},"PeriodicalIF":1.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144326884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaxing Qiu, Douglas E Lake, Pavel Chernyavskiy, Teague R Henry
{"title":"Fast leave-one-cluster-out cross-validation using clustered network information criterion.","authors":"Jiaxing Qiu, Douglas E Lake, Pavel Chernyavskiy, Teague R Henry","doi":"10.1177/09622802251345486","DOIUrl":"https://doi.org/10.1177/09622802251345486","url":null,"abstract":"<p><p>For prediction models developed on clustered data that do not account for cluster heterogeneity in model parameterization, it is crucial to use cluster-based validation to assess model generalizability on unseen clusters. This article introduces a clustered estimator of the network information criterion to approximate leave-one-cluster-out deviance for standard prediction models with twice-differentiable log-likelihood functions. The clustered network information criterion serves as a fast alternative to cluster-based cross-validation. Stone proved that the Akaike information criterion is asymptotically equivalent to leave-one-observation-out cross-validation for true parametric models with independent and identically distributed observations. Ripley noted that the network information criterion, derived from Stone's proof, is a better approximation when the model is misspecified. For clustered data, we derived clustered network information criterion by substituting the Fisher information matrix in the network information criterion with a clustering-adjusted estimator. The clustered network information criterion imposes a greater penalty when the data exhibits stronger clustering, thereby allowing the clustered network information criterion to better prevent over-parameterization. In a simulation study and an empirical example, we used standard regression to develop prediction models for clustered data with Gaussian or binomial responses. Compared to the commonly used Akaike information criterion and Bayesian information criterion for standard regression, clustered network information criterion provides a much more accurate approximation to leave-one-cluster-out deviance and results in more accurate model size and variable selection, as determined by cluster-based cross-validation, especially when the data exhibit strong clustering.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251345486"},"PeriodicalIF":1.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144326885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporation of missing indicator with multiple imputation in propensity score analysis with partially observed covariates: A simulation study.","authors":"Sevinc Puren Yucel Karakaya, Ilker Unal","doi":"10.1177/09622802251338365","DOIUrl":"https://doi.org/10.1177/09622802251338365","url":null,"abstract":"<p><p>One of the primary challenges encountered in propensity score (PS) weighting is the presence of observations with missing covariates. In such cases, several potential solutions based on multiple imputation have been proposed. The most prevalent of these is the MI<sub>te</sub> method, which combines treatment effect estimates derived from imputed datasets. A limited number of PS studies have incorporated the MI<sub>te</sub> method with the missing indicator method; however, these studies only incorporated the missing indicator into the PS model. The aim of this simulation study is to propose two novel methods that incorporate the missing indicator approach with the MI<sub>te</sub>. This incorporation either entails including the missing indicator into the outcome model (MIMI<sub>o</sub>) or, alternatively, into both the outcome and PS model (MIMI<sub>pso</sub>). The construction of the simulation scenarios was predicated on three elements: the mechanism of missing data, the type of treatment effect, and the presence of unmeasured confounding. In the presence of unmeasured confounding, the MIMI<sub>pso</sub> method was the most effective method under the MAR mechanism. In the context of the MNAR mechanism, the method that exhibited the lowest bias was MIMI<sub>o</sub> for homogeneous treatment effect and MIMI<sub>pso</sub> for heterogeneous treatment effect. The MI<sub>te</sub> method exhibited the highest levels of bias and variation. In view of the difficulties involved in identifying the mechanism of missing data, the variability in treatment effects across subgroups and the potential for unmeasured confounding variables in practice, researchers are encouraged to utilize the MIMI<sub>pso</sub> method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338365"},"PeriodicalIF":1.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144326886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jerome Johnson, Xiangyu Yu, Suzanne M Dufault, Nicholas P Jewell
{"title":"Spatiotemporal effects on dengue incidence based on a large cluster randomized study.","authors":"Jerome Johnson, Xiangyu Yu, Suzanne M Dufault, Nicholas P Jewell","doi":"10.1177/09622802251338371","DOIUrl":"10.1177/09622802251338371","url":null,"abstract":"<p><p>A recent large-scale cluster randomized test-negative study assessed the impact of a mosquito-based intervention on the incidence of clinical dengue showing a protective efficacy of 77.1% (95% CI: (65.3%, 84.9%)). While the intervention was randomized at a cluster-level, human and mosquito movement suggest potential violations in assumptions necessary for intention-to-treat analyses to produce accurate estimates of the full intervention effect due to spatial clustering of dengue cases, and/or potential non-independence in the intervention arising from spillover of the intervention (or control) across cluster boundaries. We address these distinct but related effects using two approaches. First, we examine whether a clustering effect exists, that is, whether the presence of a recent dengue case in the sample within a specified distance from a residence raises the risk of dengue. Second, we use cluster reallocation techniques to examine intervention spillover effects. We find strong spatial effects of the presence of dengue cases on the risk of clinical dengue that exhibit both serospecificity and a dose response, more evident in control than intervention clusters at least on an additive scale. Contrarily, there is no evidence of any appreciable local spillover effect from intervention to control clusters, or vice versa, in terms of either the risk of dengue infection or the level of disease clustering.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338371"},"PeriodicalIF":1.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144333872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}