{"title":"Modelling extensions for multi-location studies in environmental epidemiology.","authors":"Pierre Masselot, Antonio Gasparrini","doi":"10.1177/09622802241313284","DOIUrl":"https://doi.org/10.1177/09622802241313284","url":null,"abstract":"<p><p>Multi-location studies are increasingly used in environmental epidemiology. Their application is supported by designs and statistical techniques developed in the last decades, which however have known limitations. In this contribution, we propose an improved modelling framework that addresses these issues. Specifically, this flexible framework allows the direct modelling of demographic differences across locations, defining geographical variations linked to multiple vulnerability factors, capturing spatial heterogeneity and predicting risks to new locations, and improving the assessment of uncertainty. We illustrate these new developments in an analysis of temperature-mortality associations in Italian cities, providing fully reproducible R code and data.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241313284"},"PeriodicalIF":1.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143189642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dane Isenberg, Michael O Harhay, Nandita Mitra, Fan Li
{"title":"Weighting methods for truncation by death in cluster-randomized trials.","authors":"Dane Isenberg, Michael O Harhay, Nandita Mitra, Fan Li","doi":"10.1177/09622802241309348","DOIUrl":"https://doi.org/10.1177/09622802241309348","url":null,"abstract":"<p><p>Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor average causal effect has been proposed as a causally interpretable subgroup treatment effect defined under the principal stratification framework. However, the majority of methods for estimating the survivor average causal effect have been developed in the context of individually randomized trials. Only limited discussions have been centered around cluster-randomized trials, where methods typically involve strong distributional assumptions for outcome modeling. In this article, we propose two weighting methods to estimate the survivor average causal effect in cluster-randomized trials that obviate the need for potentially complicated outcome distribution modeling. We establish the requisite assumptions that address latent clustering effects to enable point identification of the survivor average causal effect, and we provide computationally efficient asymptotic variance estimators for each weighting estimator. In simulations, we evaluate our weighting estimators, demonstrating their finite-sample operating characteristics and robustness to certain departures from the identification assumptions. We illustrate our methods using data from a cluster-randomized trial to assess the impact of a sedation protocol on mechanical ventilation among children with acute respiratory failure.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241309348"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">High-dimensional, outcome-dependent missing data problems: Models for the human <ns0:math><ns0:mi>K</ns0:mi><ns0:mi>I</ns0:mi><ns0:mi>R</ns0:mi></ns0:math> loci.","authors":"Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer","doi":"10.1177/09622802241304112","DOIUrl":"https://doi.org/10.1177/09622802241304112","url":null,"abstract":"<p><p>Missing data problems are common in biological, high-dimensional data, where data can be partially or completely missing. Algorithms have been developed to reconstruct the missing values by means of imputation or expectation-maximization algorithms. For missing data problems, it has been suggested that the regression model of interest should be incorporated into the imputation procedure to reduce bias of the regression coefficients. We here consider a challenging missing data problem, where diplotypes of the <i>KIR</i> loci are to be reconstructed. These loci are difficult to genotype, resulting in ambiguous genotype calls. We extend a previously proposed expectation-maximization algorithm by incorporating a potentially high-dimensional regression model to model the outcome. Three strategies are evaluated: (1) only allelic predictors, (2) allelic predictors and forward-backward selection on haplotype predictors, and (3) penalized regression on a saturated model. In a simulation study, we compared these strategies with a baseline expectation-maximization algorithm without outcome model. For extreme choices of effect sizes and missingness levels, the outcome-based expectation-maximization algorithms outperformed the no-outcome expectation-maximization algorithm. However, in all other cases, the no-outcome expectation-maximization algorithm performed either superior or comparable to the three strategies, suggesting the outcome model can have a harmful effect. In a data analysis concerning death after allogeneic hematopoietic stem cell transplantation as a function of donor <i>KIR</i> genes, expectation-maximization algorithms with and without outcome showed very similar results. In conclusion, outcome based missing data models in the high-dimensional setting have to be used with care and are likely to lead to biased results.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241304112"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate contaminated normal linear mixed models applied to Alzheimer's disease study with censored and missing data.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1177/09622802241309349","DOIUrl":"https://doi.org/10.1177/09622802241309349","url":null,"abstract":"<p><p>The article proposes a robust approach to jointly modeling multiple repeated clinical measures with intricate features. More specifically, we aim to expand the scope of the multivariate linear mixed model by using the multivariate contaminated normal distribution. The proposed model, called the multivariate contaminated normal linear mixed model with censored and missing responses (MCNLMM-CM), is designed to handle minor outliers effectively, while simultaneously accommodating censored measurements and intermittent missing responses. An expectation conditional maximization either algorithm is developed to estimate the parameters of the proposed model in situations involving missing at random responses. We also provide techniques for approximating the asymptotic standard errors of the parameters, recovering censored data, imputing missing values, and identifying outliers. A simulation study is conducted to evaluate the finite-sample properties of the parameter estimators and demonstrate the superior performance of the proposed model compared to existing models. The proposed methodology is inspired by and applied to data from the Alzheimer's disease neuroimaging initiative cohort study, which involves longitudinal clinical measurements of patients with mild cognitive impairment.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241309349"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández
{"title":"Semiparametric estimator for the covariate-specific receiver operating characteristic curve.","authors":"Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández","doi":"10.1177/09622802241311458","DOIUrl":"https://doi.org/10.1177/09622802241311458","url":null,"abstract":"<p><p>The study of the predictive ability of a marker is mainly based on the accuracy measures provided by the so-called confusion matrix. Besides, the area under the receiver operating characteristic curve has become a popular index for summarizing the overall accuracy of a marker. However, the nature of the relationship between the marker and the outcome, and the role that potential confounders play in this relationship could be fundamental in order to extrapolate the observed results. Directed acyclic graphs commonly used in epidemiology and in causality, could provide good feedback for learning the possibilities and limits of this extrapolation applied to the binary classification problem. Both the covariate-specific and the covariate-adjusted receiver operating characteristic curves are valuable tools, which can help to a better understanding of the real classification abilities of a marker. Since they are strongly related with the conditional distributions of the marker on the positive (subjects with the studied characteristic) and negative (subjects without the studied characteristic) populations, the use of proportional hazard regression models arises in a very natural way. We explore the use of flexible proportional hazard Cox regression models for estimating the covariate-specific and the covariate-adjusted receiver operating characteristic curves. We study their large- and finite-sample properties and apply the proposed estimators to a real-world problem. The developed code (in R language) is provided on Supplemental Material.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241311458"},"PeriodicalIF":1.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multicategory matched learning for estimating optimal individualized treatment rules in observational studies with application to a hepatocellular carcinoma study.","authors":"Xuqiao Li, Qiuyan Zhou, Ying Wu, Ying Yan","doi":"10.1177/09622802241310328","DOIUrl":"https://doi.org/10.1177/09622802241310328","url":null,"abstract":"<p><p>One primary goal of precision medicine is to estimate the individualized treatment rules that optimize patients' health outcomes based on individual characteristics. Health studies with multiple treatments are commonly seen in practice. However, most existing individualized treatment rule estimation methods were developed for the studies with binary treatments. Many require that the outcomes are fully observed. In this article, we propose a matching-based machine learning method to estimate the optimal individualized treatment rules in observational studies with multiple treatments when the outcomes are fully observed or right-censored. We establish theoretical property for the proposed method. It is compared with the existing competitive methods in simulation studies and a hepatocellular carcinoma study.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241310328"},"PeriodicalIF":1.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating target population treatment effects in meta-analysis with individual participant-level data.","authors":"Hwanhee Hong, Lu Liu, Elizabeth A Stuart","doi":"10.1177/09622802241307642","DOIUrl":"https://doi.org/10.1177/09622802241307642","url":null,"abstract":"<p><p>Meta-analysis of randomized controlled trials is commonly used to evaluate treatments and inform policy decisions because it provides comprehensive summaries of all available evidence. However, meta-analyses are limited to draw population inference of treatment effects because they usually do not define target populations of interest specifically, and results of the individual randomized controlled trials in those meta-analyses may not generalize to the target populations. To leverage evidence from multiple randomized controlled trials in the generalizability context, we bridge the ideas from meta-analysis and causal inference. We integrate meta-analysis with causal inference approaches estimating target population average treatment effect. We evaluate the performance of the methods via simulation studies and apply the methods to generalize meta-analysis results from randomized controlled trials of treatments on schizophrenia to adults with schizophrenia who present to usual care settings in the United States. Our simulation results show that all methods perform comparably and well across different settings. The data analysis results show that the treatment effect in the target population is meaningful, although the effect size is smaller than the sample average treatment effect. We recommend applying multiple methods and comparing the results to ensure robustness, rather than relying on a single method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241307642"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arnoldus F Otto, Johannes T Ferreira, Salvatore Daniele Tomarchio, Andriëtte Bekker, Antonio Punzo
{"title":"A contaminated regression model for count health data.","authors":"Arnoldus F Otto, Johannes T Ferreira, Salvatore Daniele Tomarchio, Andriëtte Bekker, Antonio Punzo","doi":"10.1177/09622802241307613","DOIUrl":"https://doi.org/10.1177/09622802241307613","url":null,"abstract":"<p><p>In medical and health research, investigators are often interested in countable quantities such as hospital length of stay (e.g., in days) or the number of doctor visits. Poisson regression is commonly used to model such count data, but this approach can't accommodate overdispersion-when the variance exceeds the mean. To address this issue, the negative binomial (NB) distribution (NB-D) and, by extension, NB regression provide a well-documented alternative. However, real-data applications present additional challenges that must be considered. Two such challenges are (i) the presence of (mild) outliers that can influence the performance of the NB-D and (ii) the availability of covariates that can enhance inference about the mean of the count variable of interest. To jointly address these issues, we propose the contaminated NB (cNB) distribution that exhibits the necessary flexibility to accommodate mild outliers. This model is shown to be simple and intuitive in interpretation. In addition to the parameters of the NB-D, our proposed model has a parameter describing the proportion of mild outliers and one specifying the degree of contamination. To allow available covariates to improve the estimation of the mean of the cNB distribution, we propose the cNB regression model. An expectation-maximization algorithm is outlined for parameter estimation, and its performance is evaluated through a parameter recovery study. The effectiveness of our model is demonstrated via a sensitivity analysis and on two health datasets, where it outperforms well-known count models. The methodology proposed is implemented in an R package which is available at https://github.com/arnootto/cNB.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241307613"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Genetti, Giuliana Cortese, Henrik Ravn, Thomas Scheike
{"title":"Efficient estimation of the marginal mean of recurrent events in randomized controlled trials.","authors":"Luca Genetti, Giuliana Cortese, Henrik Ravn, Thomas Scheike","doi":"10.1177/09622802241289557","DOIUrl":"https://doi.org/10.1177/09622802241289557","url":null,"abstract":"<p><p>Recurrent events data are often encountered in biomedical settings, where individuals may also experience a terminal event such as death. A useful estimand to summarize such data is the marginal mean of the cumulative number of recurrent events up to a specific time horizon, allowing also for the possible presence of a terminal event. Recently, it was found that augmented estimators can estimate this quantity efficiently, providing improved inference. Improvement in efficiency by the use of covariate adjustment is increasing in popularity as the methods get further developed, and is supported by regulatory agencies EMA (2015) and FDA (2023). Motivated by these arguments, this article presents novel efficient estimators for clinical data from randomized controlled trials, accounting for additional information from auxiliary covariates. Moreover, in randomized studies when both right censoring and competing risks are present, we propose a novel doubly augmented estimator of the marginal mean , which has two optimal augmentation components due to censoring and randomization. We provide theoretical and asymptotic details for the novel estimators, also confirmed by simulation studies. Then, we discuss how to improve efficiency, both theoretically by computing the expected amount of variance reduction, and practically by showing the performance of different working regression models that are needed in the augmentation, when they are correctly specified or misspecified. The methods are applied to the LEADER study, a randomized controlled trial that studied cardiovascular safety of treatments in type 2 diabetes patients.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241289557"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group sequential design using restricted mean survival time as the primary endpoint in clinical trials.","authors":"Zhaojin Li, Xiang Geng, Yawen Hou, Zheng Chen","doi":"10.1177/09622802241304111","DOIUrl":"https://doi.org/10.1177/09622802241304111","url":null,"abstract":"<p><p>The proportional hazards (PH) assumption is often violated in clinical trials. If the most commonly used Log-rank test is used for trial design in non-proportional hazard (NPH) cases, it will result in power loss or inflation, and the effect measures hazard ratio will become difficult to interpret. To circumvent the issue caused by the NPH for trial design and to make the effect measures easy to interpret and communicate, two simulation-free methods about restricted mean survival time group sequential (GS-RMST) design are introduced in this study: the independent increment GS-RMST (GS-RMSTi) design and the non-independent increment GS-RMST (GS-RMSTn) design. For the above two designs, the corresponding analytic expression of the variance-covariance matrix, the calculations of the stopping boundaries and sample size are given in the study. Simulation studies show that both designs can achieve the corresponding nominal type I error and nominal power. The GS-RMSTn simulation studies show that the Max-Combo test group sequential design is robust in different NPH scenarios and is suitable for discovering whether there is a treatment effect difference. However, it does not have a corresponding easy-to-interpret effect measure indicating effect difference magnitude. GS-RMST performs well in both PH and NPH scenarios, and it can obtain time-scale effect measures that are easy to understand by both physicians and patients. Examples of both GS-RMST designs are also illustrated.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241304111"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}