Nicholas T Williams, Richard Liu, Katherine L Hoffman, Sarah Forrest, Kara E Rudolph, Iván Díaz
{"title":"Two-stage targeted minimum-loss based estimation for non-negative two-part outcomes.","authors":"Nicholas T Williams, Richard Liu, Katherine L Hoffman, Sarah Forrest, Kara E Rudolph, Iván Díaz","doi":"10.1177/09622802251340245","DOIUrl":"10.1177/09622802251340245","url":null,"abstract":"<p><p>Non-negative two-part outcomes are defined as outcomes with a density function that have a zero point mass but are otherwise positive. Examples, such as healthcare expenditure and hospital length of stay, are common in healthcare utilization research. Despite the practical relevance of non-negative two-part outcomes, few methods exist to leverage knowledge of their semicontinuity to achieve improved performance in estimating causal effects. In this paper, we develop a nonparametric two-stage targeted minimum-loss based estimator (denoted as hTMLE) for non-negative two-part outcomes. We present methods for a general class of interventions, which can accommodate continuous, categorical, and binary exposures. The two-stage TMLE uses a targeted estimate of the intensity component of the outcome to produce a targeted estimate of the binary component of the outcome that may improve finite sample efficiency. We demonstrate the efficiency gains achieved by the two-stage TMLE with simulated examples and then apply it to a cohort of Medicaid beneficiaries to estimate the effect of chronic pain and physical disability on days' supply of opioids.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1431-1441"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144498076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible regression methods for estimating optimal individualized treatment regimes with scalar and functional covariates.","authors":"Kaidi Kong, Li Guan, Zhongzhan Zhang","doi":"10.1177/09622802251340259","DOIUrl":"10.1177/09622802251340259","url":null,"abstract":"<p><p>In personalized medicine study, how to estimate the optimal individualized treatment regime based on available individual information is a fundamental problem. In recent years, functional data analysis has appeared extensively in medical research, while the optimal individualized treatment regime based on the combination of scalar covariates and functional covariates have rarely been studied and the only few studies are mostly conducted in the context of randomized trials. In this article, we propose a flexible regression-based approach in which the outcome variable is real-valued and the covariates contain multiple scalar covariates and a functional covariate. Our approach is applicable to both randomized trials and observational studies, and the convergence rates of the proposed optimal individualized treatment regime estimators are presented for both situations. Sufficient simulation studies and a real data analysis are conducted to justified the validity of our proposed method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1459-1479"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144249607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporation of missing indicator with multiple imputation in propensity score analysis with partially observed covariates: A simulation study.","authors":"Sevinc Puren Yucel Karakaya, Ilker Unal","doi":"10.1177/09622802251338365","DOIUrl":"10.1177/09622802251338365","url":null,"abstract":"<p><p>One of the primary challenges encountered in propensity score (PS) weighting is the presence of observations with missing covariates. In such cases, several potential solutions based on multiple imputation have been proposed. The most prevalent of these is the MI<sub>te</sub> method, which combines treatment effect estimates derived from imputed datasets. A limited number of PS studies have incorporated the MI<sub>te</sub> method with the missing indicator method; however, these studies only incorporated the missing indicator into the PS model. The aim of this simulation study is to propose two novel methods that incorporate the missing indicator approach with the MI<sub>te</sub>. This incorporation either entails including the missing indicator into the outcome model (MIMI<sub>o</sub>) or, alternatively, into both the outcome and PS model (MIMI<sub>pso</sub>). The construction of the simulation scenarios was predicated on three elements: the mechanism of missing data, the type of treatment effect, and the presence of unmeasured confounding. In the presence of unmeasured confounding, the MIMI<sub>pso</sub> method was the most effective method under the MAR mechanism. In the context of the MNAR mechanism, the method that exhibited the lowest bias was MIMI<sub>o</sub> for homogeneous treatment effect and MIMI<sub>pso</sub> for heterogeneous treatment effect. The MI<sub>te</sub> method exhibited the highest levels of bias and variation. In view of the difficulties involved in identifying the mechanism of missing data, the variability in treatment effects across subgroups and the potential for unmeasured confounding variables in practice, researchers are encouraged to utilize the MIMI<sub>pso</sub> method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1293-1302"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144326886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gilbert Kiprotich, Diego Ignacio Gallardo, Pedro Luiz Ramos, Thomas Augustin
{"title":"A shared frailty regression model for clustered survival data.","authors":"Gilbert Kiprotich, Diego Ignacio Gallardo, Pedro Luiz Ramos, Thomas Augustin","doi":"10.1177/09622802251338984","DOIUrl":"10.1177/09622802251338984","url":null,"abstract":"<p><p>In this article, we propose a new frailty model based on a mixture of inverse Gaussian distributions for multivariate lifetimes. This approach provides an advantage over previous models, as the weights are directly determined through parameterization of the mixture, removing the need for arbitrary guesswork in the weighting process. Moreover, the closed-form Laplace transform of the model facilitates the quantification of Kendall's tau measure of dependence. The frailty model's parametric and flexible parametric variants are examined. For parameter estimation, the expectation-maximization technique is employed, taking advantage of the hierarchical representation of the frailty distribution, providing a simpler and more stable method than directly maximizing the observed log-likelihood function. The performance of the estimators is assessed numerically using Monte Carlo simulations. We apply our methodology to two medical datasets on cancer. The results indicate the benefits of the proposed model over existing frailty models in the literature. The implementation of the procedure is added to the R package extrafrail.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1385-1412"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jerome Johnson, Xiangyu Yu, Suzanne M Dufault, Nicholas P Jewell
{"title":"Spatiotemporal effects on dengue incidence based on a large cluster randomized study.","authors":"Jerome Johnson, Xiangyu Yu, Suzanne M Dufault, Nicholas P Jewell","doi":"10.1177/09622802251338371","DOIUrl":"10.1177/09622802251338371","url":null,"abstract":"<p><p>A recent large-scale cluster randomized test-negative study assessed the impact of a mosquito-based intervention on the incidence of clinical dengue showing a protective efficacy of 77.1% (95% CI: (65.3%, 84.9%)). While the intervention was randomized at a cluster-level, human and mosquito movement suggest potential violations in assumptions necessary for intention-to-treat analyses to produce accurate estimates of the full intervention effect due to spatial clustering of dengue cases, and/or potential non-independence in the intervention arising from spillover of the intervention (or control) across cluster boundaries. We address these distinct but related effects using two approaches. First, we examine whether a clustering effect exists, that is, whether the presence of a recent dengue case in the sample within a specified distance from a residence raises the risk of dengue. Second, we use cluster reallocation techniques to examine intervention spillover effects. We find strong spatial effects of the presence of dengue cases on the risk of clinical dengue that exhibit both serospecificity and a dose response, more evident in control than intervention clusters at least on an additive scale. Contrarily, there is no evidence of any appreciable local spillover effect from intervention to control clusters, or vice versa, in terms of either the risk of dengue infection or the level of disease clustering.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1303-1313"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308035/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144333872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiply robust causal inference in the presence of an error-prone treatment.","authors":"Shaojie Wei, Qinpeng He, Wei Li, Zhi Geng","doi":"10.1177/09622802251338364","DOIUrl":"10.1177/09622802251338364","url":null,"abstract":"<p><p>Numerous estimation procedures employed in causal inference often rely on accurately measured data. However, the prevalence of measurement errors in practical studies may yield biased effect estimates. It is common to employ validation samples to rectify such biases in the measurement error literature. This article focuses on the estimation of the average causal effect with a misclassified binary treatment in a primary population of interest. By leveraging a validation sample with covariates, an error-prone version of treatment and a true treatment recorded, we provide identifiability results under certain conditions. Building on identifiability, we explore three classes of estimators, each demonstrating consistency and asymptotic normality within distinct model sets. Furthermore, we propose a multiply robust estimation approach for the treatment effect based on the semiparametric theory framework. The multiply robust estimator retains consistent under any one of the listed model sets and achieves the semiparametric efficiency bound, provided all models are correct. We demonstrate the satisfactory performance of the proposed estimators through simulation studies and a real data analysis.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1314-1327"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144286522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing accuracy for multi-class classification when subclasses are involved.","authors":"Nan Nan, Lili Tian","doi":"10.1177/09622802251343600","DOIUrl":"10.1177/09622802251343600","url":null,"abstract":"<p><p>Classifications that involve subclasses are common in many applied fields. \"Compound multi-class classification\" refers to the settings which involve three or more main classes and at least one of the main classes has multiple subclasses. In this paper, we propose an accuracy metric proper for \"compound <math><mi>M</mi></math>-class classification,\" namely \"hypervolume under compound <math><mrow><mi>R</mi><mi>O</mi><mi>C</mi></mrow></math> manifold <math><mo>(</mo><mi>H</mi><mi>U</mi><msub><mi>M</mi><mrow><mi>C</mi><mo>,</mo><mi>M</mi></mrow></msub><mo>)</mo></math>.\" The proposed <math><mi>H</mi><mi>U</mi><msub><mi>M</mi><mrow><mi>C</mi><mo>,</mo><mi>M</mi></mrow></msub></math> evaluates the overall accuracy of a biomarker measured on continuous scale correctly identifying <math><mi>M</mi></math> main classes without requiring specification of an ordering in terms of marker values for subclasses relative to each other within each main class. The probabilistic interpretation of <math><mi>H</mi><mi>U</mi><msub><mi>M</mi><mrow><mi>C</mi><mo>,</mo><mi>M</mi></mrow></msub></math> is analytically derived. A network-based computing algorithm which enables efficient computation of the empirical estimate of <math><mi>H</mi><mi>U</mi><msub><mi>M</mi><mrow><mi>C</mi><mo>,</mo><mi>M</mi></mrow></msub></math> is developed. Non-parametric bootstrap percentile confidence intervals of <math><mi>H</mi><mi>U</mi><msub><mi>M</mi><mrow><mi>C</mi><mo>,</mo><mi>M</mi></mrow></msub></math> are assessed through extensive simulation studies. Lastly, a real data example is included to illustrate the usage of our proposed method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1480-1503"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark A van de Wiel, Gwenaël Gr Leday, Martijn W Heymans, Erik W van Zwet, Ailko H Zwinderman, Jeroen Hoogland
{"title":"Alternatives to default shrinkage methods can improve prediction accuracy, calibration, and coverage: A methods comparison study.","authors":"Mark A van de Wiel, Gwenaël Gr Leday, Martijn W Heymans, Erik W van Zwet, Ailko H Zwinderman, Jeroen Hoogland","doi":"10.1177/09622802251338440","DOIUrl":"10.1177/09622802251338440","url":null,"abstract":"<p><p>While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Here, the latter evaluates whether the amount of uncertainty is correctly quantified. Much of the criticism stems from the usage of standard shrinkage methods, such as lasso and ridge with a single, cross-validated penalty. Our aim is to show that readily available alternatives may improve predictive performance, in terms of accuracy, calibration or coverage. We study linear and logistic regression. For linear regression, we use small sample splits of a large, fairly typical epidemiological data set to illustrate that usage of differential ridge penalties for covariate groups may enhance prediction accuracy, while calibration and coverage benefit from additional shrinkage of the penalties. Bayesian hierarchical modeling facilitates the latter, including local shrinkage. In the logistic regression setting, we apply an external simulation to illustrate that local shrinkage may improve calibration with respect to global shrinkage, while providing better prediction accuracy than other solutions, like Firth's correction. The potential benefits of the alternative shrinkage methods are easily accessible via example implementations in R, including the estimation of multiple penalties. A synthetic copy of the large data set is shared for reproducibility.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1342-1355"},"PeriodicalIF":1.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial areas under the curve of the cumulative distribution function as a new composite estimand for randomized clinical trials.","authors":"Masataka Taguri, Kenichi Hayashi","doi":"10.1177/09622802251314195","DOIUrl":"10.1177/09622802251314195","url":null,"abstract":"<p><p>Clinical trials often face the challenge of post-randomization events, such as the initiation of rescue therapy or the premature discontinuation of randomized treatment. Such events, called \"intercurrent events\" (ICEs) in ICH E9(R1), may influence the estimation and interpretation of treatment effects. According to ICH E9(R1), there are five strategies for handling ICEs. This study focuses on the composite strategy, which incorporates ICEs in the outcome of interest and defines the treatment effects using composite endpoints that combine the measured continuous variables and ICEs. An advantage of this strategy is that it avoids the occurrence of missing data because they are defined as part of the outcome of interest. In this study, we propose a new composite estimand: the difference in the partial areas under the curves (pAUCs) of the cumulative distribution function. While the pAUC is closely related to the trimmed mean approach proposed by Permutt and Li, it offers the advantage of allowing pre-specification of the cutoff value for a \"good\" response based on clinical considerations. This ensures that the pAUC can be calculated irrespective of the proportion of ICEs. We describe the causal interpretation of our method and its relationship with two other strategies (treatment policy and hypothetical strategies) using a potential outcome framework. We present simulation results in which our method performs reasonably well compared to several existing approaches in terms of type I error, power, and the proportion of undefined test statistics.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1097-1113"},"PeriodicalIF":1.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144062239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of recurrent events in cluster randomised trials: The PLEASANT trial case study.","authors":"Kelly Grant, Steven A Julious","doi":"10.1177/09622802251316972","DOIUrl":"10.1177/09622802251316972","url":null,"abstract":"<p><p>Recurrent events for many clinical conditions, such as asthma, can indicate poor health outcomes. Recurrent events data are often analysed using statistical methods such as Cox regression or negative binomial regression, suffering event or time information loss. This article re-analyses the preventing and lessening exacerbations of asthma in school-age children associated with a new term (PLEASANT) trial data as a case study, investigating the utility, extending recurrent events survival analysis methods to cluster randomised trials. A conditional frailty model is used, with the frailty term at the general practitioner practice level, accounting for clustering. A rare events bias adjustment is applied if few participants had recurrent events and truncation of small event risk sets is explored, to improve model accuracy. Global and event-specific estimates are presented, alongside a mean cumulative function plot to aid interpretation. The conditional frailty model global results are similar to PLEASANT results, but with greater precision (include time, recurrent events, within-participant dependence, and rare events adjustment). Event-specific results suggest an increasing risk reduction in medical appointments for the intervention group, in September-December 2013, as medical contacts increase over time. The conditional frailty model is recommended when recurrent events are a study outcome for clinical trials, including cluster randomised trials, to help explain changes in event risk over time, assisting clinical interpretation.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"1079-1096"},"PeriodicalIF":1.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12209553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}