{"title":"Modelling extensions for multi-location studies in environmental epidemiology.","authors":"Pierre Masselot, Antonio Gasparrini","doi":"10.1177/09622802241313284","DOIUrl":"10.1177/09622802241313284","url":null,"abstract":"<p><p>Multi-location studies are increasingly used in environmental epidemiology. Their application is supported by designs and statistical techniques developed in the last decades, which however have known limitations. In this contribution, we propose an improved modelling framework that addresses these issues. Specifically, this flexible framework allows the direct modelling of demographic differences across locations, defining geographical variations linked to multiple vulnerability factors, capturing spatial heterogeneity and predicting risks to new locations, and improving the assessment of uncertainty. We illustrate these new developments in an analysis of temperature-mortality associations in Italian cities, providing fully reproducible R code and data.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"615-629"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143189642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Wade, Peter Sarich, Pavla Vaneckova, Silvia Behar-Harpaz, Preston J Ngo, Paul B Grogan, Sonya Cressman, Coral E Gartner, John M Murray, Tony Blakely, Emily Banks, Martin C Tammemagi, Karen Canfell, Marianne F Weber, Michael Caruana
{"title":"Using Bayesian evidence synthesis to quantify uncertainty in population trends in smoking behaviour.","authors":"Stephen Wade, Peter Sarich, Pavla Vaneckova, Silvia Behar-Harpaz, Preston J Ngo, Paul B Grogan, Sonya Cressman, Coral E Gartner, John M Murray, Tony Blakely, Emily Banks, Martin C Tammemagi, Karen Canfell, Marianne F Weber, Michael Caruana","doi":"10.1177/09622802241310326","DOIUrl":"10.1177/09622802241310326","url":null,"abstract":"<p><p>Simulation models of smoking behaviour provide vital forecasts of exposure to inform policy targets, estimates of the burden of disease, and impacts of tobacco control interventions. A key element of useful model-based forecasts is a clear picture of uncertainty due to the data used to inform the model, however, assessment of this parameter uncertainty is incomplete in almost all tobacco control models. As a remedy, we demonstrate a Bayesian approach to model calibration that quantifies parameter uncertainty. With a model calibrated to Australian data, we observed that the smoking cessation rate in Australia has increased with calendar year since the late 20th century, and in 2016 people who smoked would quit at a rate of 4.7 quit-events per 100 person-years (90% equal-tailed interval (ETI): 4.5-4.9). We found that those who quit smoking before age 30 years switched to reporting that they never smoked at a rate of approximately 2% annually (90% ETI: 1.9-2.2%). The Bayesian approach demonstrated here can be used as a blueprint to model other population behaviours that are challenging to measure directly, and to provide a clearer picture of uncertainty to decision-makers.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"545-560"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández
{"title":"Semiparametric estimator for the covariate-specific receiver operating characteristic curve.","authors":"Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández","doi":"10.1177/09622802241311458","DOIUrl":"10.1177/09622802241311458","url":null,"abstract":"<p><p>The study of the predictive ability of a marker is mainly based on the accuracy measures provided by the so-called confusion matrix. Besides, the area under the receiver operating characteristic curve has become a popular index for summarizing the overall accuracy of a marker. However, the nature of the relationship between the marker and the outcome, and the role that potential confounders play in this relationship could be fundamental in order to extrapolate the observed results. Directed acyclic graphs commonly used in epidemiology and in causality, could provide good feedback for learning the possibilities and limits of this extrapolation applied to the binary classification problem. Both the covariate-specific and the covariate-adjusted receiver operating characteristic curves are valuable tools, which can help to a better understanding of the real classification abilities of a marker. Since they are strongly related with the conditional distributions of the marker on the positive (subjects with the studied characteristic) and negative (subjects without the studied characteristic) populations, the use of proportional hazard regression models arises in a very natural way. We explore the use of flexible proportional hazard Cox regression models for estimating the covariate-specific and the covariate-adjusted receiver operating characteristic curves. We study their large- and finite-sample properties and apply the proposed estimators to a real-world problem. The developed code (in R language) is provided on Supplemental Material.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"594-614"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">High-dimensional, outcome-dependent missing data problems: Models for the human <ns0:math><ns0:mi>K</ns0:mi><ns0:mi>I</ns0:mi><ns0:mi>R</ns0:mi></ns0:math> loci.","authors":"Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer","doi":"10.1177/09622802241304112","DOIUrl":"10.1177/09622802241304112","url":null,"abstract":"<p><p>Missing data problems are common in biological, high-dimensional data, where data can be partially or completely missing. Algorithms have been developed to reconstruct the missing values by means of imputation or expectation-maximization algorithms. For missing data problems, it has been suggested that the regression model of interest should be incorporated into the imputation procedure to reduce bias of the regression coefficients. We here consider a challenging missing data problem, where diplotypes of the <i>KIR</i> loci are to be reconstructed. These loci are difficult to genotype, resulting in ambiguous genotype calls. We extend a previously proposed expectation-maximization algorithm by incorporating a potentially high-dimensional regression model to model the outcome. Three strategies are evaluated: (1) only allelic predictors, (2) allelic predictors and forward-backward selection on haplotype predictors, and (3) penalized regression on a saturated model. In a simulation study, we compared these strategies with a baseline expectation-maximization algorithm without outcome model. For extreme choices of effect sizes and missingness levels, the outcome-based expectation-maximization algorithms outperformed the no-outcome expectation-maximization algorithm. However, in all other cases, the no-outcome expectation-maximization algorithm performed either superior or comparable to the three strategies, suggesting the outcome model can have a harmful effect. In a data analysis concerning death after allogeneic hematopoietic stem cell transplantation as a function of donor <i>KIR</i> genes, expectation-maximization algorithms with and without outcome showed very similar results. In conclusion, outcome based missing data models in the high-dimensional setting have to be used with care and are likely to lead to biased results.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"440-456"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951372/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multicategory matched learning for estimating optimal individualized treatment rules in observational studies with application to a hepatocellular carcinoma study.","authors":"Xuqiao Li, Qiuyan Zhou, Ying Wu, Ying Yan","doi":"10.1177/09622802241310328","DOIUrl":"10.1177/09622802241310328","url":null,"abstract":"<p><p>One primary goal of precision medicine is to estimate the individualized treatment rules that optimize patients' health outcomes based on individual characteristics. Health studies with multiple treatments are commonly seen in practice. However, most existing individualized treatment rule estimation methods were developed for the studies with binary treatments. Many require that the outcomes are fully observed. In this article, we propose a matching-based machine learning method to estimate the optimal individualized treatment rules in observational studies with multiple treatments when the outcomes are fully observed or right-censored. We establish theoretical property for the proposed method. It is compared with the existing competitive methods in simulation studies and a hepatocellular carcinoma study.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"508-522"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate contaminated normal linear mixed models applied to Alzheimer's disease study with censored and missing data.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1177/09622802241309349","DOIUrl":"10.1177/09622802241309349","url":null,"abstract":"<p><p>The article proposes a robust approach to jointly modeling multiple repeated clinical measures with intricate features. More specifically, we aim to expand the scope of the multivariate linear mixed model by using the multivariate contaminated normal distribution. The proposed model, called the multivariate contaminated normal linear mixed model with censored and missing responses (MCNLMM-CM), is designed to handle minor outliers effectively, while simultaneously accommodating censored measurements and intermittent missing responses. An expectation conditional maximization either algorithm is developed to estimate the parameters of the proposed model in situations involving missing at random responses. We also provide techniques for approximating the asymptotic standard errors of the parameters, recovering censored data, imputing missing values, and identifying outliers. A simulation study is conducted to evaluate the finite-sample properties of the parameter estimators and demonstrate the superior performance of the proposed model compared to existing models. The proposed methodology is inspired by and applied to data from the Alzheimer's disease neuroimaging initiative cohort study, which involves longitudinal clinical measurements of patients with mild cognitive impairment.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"490-507"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandra Blenkinsop, Lysandros Sofocleous, Francesco Di Lauro, Evangelia Georgia Kostaki, Ard van Sighem, Daniela Bezemer, Thijs van de Laar, Peter Reiss, Godelieve de Bree, Nikos Pantazis, Oliver Ratmann
{"title":"Bayesian mixture models for phylogenetic source attribution from consensus sequences and time since infection estimates.","authors":"Alexandra Blenkinsop, Lysandros Sofocleous, Francesco Di Lauro, Evangelia Georgia Kostaki, Ard van Sighem, Daniela Bezemer, Thijs van de Laar, Peter Reiss, Godelieve de Bree, Nikos Pantazis, Oliver Ratmann","doi":"10.1177/09622802241309750","DOIUrl":"10.1177/09622802241309750","url":null,"abstract":"<p><p>In stopping the spread of infectious diseases, pathogen genomic data can be used to reconstruct transmission events and characterize population-level sources of infection. Most approaches for identifying transmission pairs do not account for the time passing since the divergence of pathogen variants in individuals, which is problematic in viruses with high within-host evolutionary rates. This prompted us to consider possible transmission pairs in terms of phylogenetic data and additional estimates of time since infection derived from clinical biomarkers. We develop Bayesian mixture models with an evolutionary clock as a signal component and additional mixed effects or covariate random functions describing the mixing weights to classify potential pairs into likely and unlikely transmission pairs. We demonstrate that although sources cannot be identified at the individual level with certainty, even with the additional data on time elapsed, inferences into the population-level sources of transmission are possible, and more accurate than using only phylogenetic data without time since infection estimates. We apply the proposed approach to estimate age-specific sources of HIV infection in Amsterdam tranamission networks among men who have sex with men between 2010 and 2021. This study demonstrates that infection time estimates provide informative data to characterize transmission sources, and shows how phylogenetic source attribution can then be done with multi-dimensional mixture models.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"523-544"},"PeriodicalIF":1.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extension of Fisher's least significant difference method to multi-armed group-sequential response-adaptive designs.","authors":"Wenyu Liu, D Stephen Coad","doi":"10.1177/09622802251319896","DOIUrl":"https://doi.org/10.1177/09622802251319896","url":null,"abstract":"<p><p>Multi-armed multi-stage designs evaluate experimental treatments using a control arm at interim analyses. Incorporating response-adaptive randomisation in these designs allows early stopping, faster treatment selection and more patients to be assigned to the more promising treatments. Existing frequentist multi-armed multi-stage designs demonstrate that the family-wise error rate is strongly controlled, but they may be too conservative and lack power when the experimental treatments are very different therapies rather than doses of the same drug. Moreover, the designs use a fixed allocation ratio. In this article, Fisher's least significant difference method extended to group-sequential response-adaptive designs is investigated. It is shown mathematically that the information time continues after dropping inferior arms, and hence the error-spending approach can be used to control the family-wise error rate. Two optimal allocations were considered. One ensures efficient estimation of the treatment effects and the other maximises the power subject to a fixed total sample size. Operating characteristics of the group-sequential response-adaptive design for normal and censored survival outcomes based on simulation and redesigning the NeoSphere trial were compared with those of a fixed-sample design. Results show that the adaptive design attains efficient and ethical advantages, and that the family-wise error rate is well controlled.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251319896"},"PeriodicalIF":1.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143493488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating target population treatment effects in meta-analysis with individual participant-level data.","authors":"Hwanhee Hong, Lu Liu, Elizabeth A Stuart","doi":"10.1177/09622802241307642","DOIUrl":"10.1177/09622802241307642","url":null,"abstract":"<p><p>Meta-analysis of randomized controlled trials is commonly used to evaluate treatments and inform policy decisions because it provides comprehensive summaries of all available evidence. However, meta-analyses are limited to draw population inference of treatment effects because they usually do not define target populations of interest specifically, and results of the individual randomized controlled trials in those meta-analyses may not generalize to the target populations. To leverage evidence from multiple randomized controlled trials in the generalizability context, we bridge the ideas from meta-analysis and causal inference. We integrate meta-analysis with causal inference approaches estimating target population average treatment effect. We evaluate the performance of the methods via simulation studies and apply the methods to generalize meta-analysis results from randomized controlled trials of treatments on schizophrenia to adults with schizophrenia who present to usual care settings in the United States. Our simulation results show that all methods perform comparably and well across different settings. The data analysis results show that the treatment effect in the target population is meaningful, although the effect size is smaller than the sample average treatment effect. We recommend applying multiple methods and comparing the results to ensure robustness, rather than relying on a single method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"355-368"},"PeriodicalIF":1.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphical methods to illustrate the nature of the relation between a continuous variable and the outcome when using restricted cubic splines with a Cox proportional hazards model.","authors":"Peter C Austin","doi":"10.1177/09622802241287707","DOIUrl":"10.1177/09622802241287707","url":null,"abstract":"<p><p>Restricted cubic splines (RCS) allow analysts to model nonlinear relations between continuous covariates and the outcome in a regression model. When using RCS with the Cox proportional hazards model, there is no longer a single hazard ratio for the continuous variable. Instead, the hazard ratio depends on the values of the covariate for the two individuals being compared. Thus, using age as an example, when one assumes a linear relation between age and the log-hazard of the outcome there is a single hazard ratio comparing any two individuals whose age differs by 1 year. However, when allowing for a nonlinear relation between age and the log-hazard of the outcome, the hazard ratio comparing the hazard of the outcome between a 31- and a 30-year-old may differ from the hazard ratio comparing the hazard of the outcome between an 81- and an 80-year-old. We describe four methods to describe graphically the relation between a continuous variable and the outcome when using RCS with a Cox model. These graphical methods are based on plots of relative hazard ratios, cumulative incidence, hazards, and cumulative hazards against the continuous variable. Using a case study of patients presenting to hospital with heart failure and a series of mathematical derivations, we illustrate that the four methods will produce qualitatively similar conclusions about the nature of the relation between a continuous variable and the outcome. Use of these methods will allow for an intuitive communication of the nature of the relation between the variable and the outcome.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"277-285"},"PeriodicalIF":1.6,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}