{"title":"So Many Choices: A Guide to Selecting Among Methods to Adjust for Observed Confounders.","authors":"Luke Keele, Richard Grieve","doi":"10.1002/sim.10336","DOIUrl":"10.1002/sim.10336","url":null,"abstract":"<p><p>Non-randomised studies (NRS) typically assume that there are no differences in unobserved baseline characteristics between the treatment groups under comparison. Traditionally regression models have been deployed to estimate treatment effects adjusting for observed confounders but can lead to biased estimates if the model is missspecified, by making incorrect functional form assumptions. A multitude of alternative methods have been developed which can reduce the risk of bias due to model misspecification. Investigators can now choose between many forms of matching, weighting, doubly robust, and machine learning methods. We review key concepts related to functional form assumptions and how those can contribute to bias from model misspecification. We then categorize the three frameworks for modeling treatment effects and the wide variety of estimation methods that can be applied to each framework. We consider why machine learning methods have been widely proposed for estimation and review the strengths and weaknesses of these approaches. We apply a range of these methods in re-analyzing a landmark case study. In the application, we examine how several widely used methods may be subject to bias from model misspecification. We conclude with a set of recommendations for practice.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 5","pages":"e10336"},"PeriodicalIF":1.8,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143415320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Óscar Lado-Baleato, Javier Roca-Pardiñas, Carmen Cadarso-Suárez, Francisco Gude
{"title":"Testing Covariates Effects on Bivariate Reference Regions.","authors":"Óscar Lado-Baleato, Javier Roca-Pardiñas, Carmen Cadarso-Suárez, Francisco Gude","doi":"10.1002/sim.10308","DOIUrl":"10.1002/sim.10308","url":null,"abstract":"<p><p>Correlated clinical measurements are routinely interpreted via comparisons with univariate reference intervals examined side by side. Multivariate reference regions (MVRs), i.e., regions that characterize the distribution of multivariate results, have been proposed as a more adequate interpretation tool in such situations. However, MVR estimation methods have not yet been fully developed and are rarely used by physicians. The multivariate distribution of correlated measurements might change with certain patient characteristics (e.g., age or gender), and their effect on the shape of an MVR can be complex, involving interaction terms. For instance, the reference region shape for a given set of continuous covariates might vary across groups with respect to the value of a categorical variable. This paper examines the use of a bootstrap-based hypothesis test for examining the effect of covariates on bivariate reference regions, testing the effect of factor-by-region interactions. An estimation algorithm based on smoothing splines was used to construct the bivariate reference region for a pediatric anthropometric dataset, and the bootstrapping procedure was used to determine the effect of age and gender on the shape of the reference region. (Height, weight) bivariate distribution was shown to depend on the interaction between age and gender. The bootstrapping procedure confirmed that a bivariate growth chart is desirable over univariate age-gender body mass index (BMI) percentile curves. Whereas the well-known BMI criterion detects only two atypical situations (i.e., underweight, overweight), the bootstrap-tested bivariate reference region detected abnormally large or small body frames for different ages and genders.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10308"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching-Assisted Power Prior for Incorporating Real-World Data in Randomized Clinical Trial Analysis.","authors":"Ruoyuan Qian, Biqing Yang, Xinyi Xu, Bo Lu","doi":"10.1002/sim.10342","DOIUrl":"10.1002/sim.10342","url":null,"abstract":"<p><p>Leveraging external data information to supplement randomized clinical trials has been a popular topic in recent years, especially for medical device and drug discovery. In rare diseases, it is very challenging to recruit patients and run a large-scale randomized trial. To take advantage of real-world data from historical trials on the same disease, we can run a small hybrid trial and borrow historical controls to increase the power. But the borrowing needs to be conducted in a statistically principled manner. Bayesian power prior methods and propensity score adjustments have been discussed in the literature. In this paper, we propose a matching-assisted power prior approach to better mitigate observed bias when incorporating external data. A subset of comparable external subjects is selected by groups through template matching, and different weights are assigned to these groups based on their similarity to the current study population. Power priors are then implemented to incorporate the information into Bayesian inference. Unlike conventional power prior methods, which discount all control patients similarly, matching pre-selects good controls, hence improved the quality of external data being borrowed. We compare its performance with the existing propensity score-integrated power prior approach through simulation studies and illustrate the implementation using data from a real acupuncture clinical trial.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10342"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Multi-Label Classification With Gene-Environment Interactions in Disease Modeling.","authors":"Jingmao Li, Qingzhao Zhang, Shuangge Ma, Kuangnan Fang, Yaqing Xu","doi":"10.1002/sim.10330","DOIUrl":"10.1002/sim.10330","url":null,"abstract":"<p><p>In biomedical studies, gene-environment (G-E) interactions have been demonstrated to have important implications for analyzing disease outcomes beyond the main G and main E effects. Many approaches have been developed for G-E interaction analysis, yielding important findings. However, hierarchical multi-label classification, which provides insightful information on disease outcomes, remains unexplored in G-E analysis literature. Moreover, unlabeled data are commonly observed in practical settings but omitted by many existing methods of hierarchical multi-label classification. In this study, we consider a semi-supervised scenario and develop a novel approach for the two-layer hierarchical response with G-E interactions. A two-step penalized estimation is then proposed using an efficient expectation-maximization (EM) algorithm. Simulation shows that it has superior performance in classification and feature selection. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer demonstrates the practical utility of the proposed method. Overall, this study can fill the important knowledge gap in G-E interaction analysis by providing a widely applicable framework for hierarchical multi-label classification of complex disease outcomes.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10330"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143047884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Modeling of Cancer Outcomes Using Genetic Variables Assisted by Pathological Imaging Data.","authors":"Yunju Im, Rong Li, Shuangge Ma","doi":"10.1002/sim.10350","DOIUrl":"10.1002/sim.10350","url":null,"abstract":"<p><p>With the increasing maturity of genetic profiling, an essential and routine task in cancer research is to model disease outcomes/phenotypes using genetic variables. Many methods have been successfully developed. However, oftentimes, empirical performance is unsatisfactory because of a \"lack of information.\" In cancer research and clinical practice, a source of information that is broadly available and highly cost-effective comes from pathological images, which are routinely collected for definitive diagnosis and staging. In this article, we consider a Bayesian approach for selecting relevant genetic variables and modeling their relationships with a cancer outcome/phenotype. We propose borrowing information from (manually curated, low-dimensional) pathological imaging features via reinforcing the same selection results for the cancer outcome and imaging features. We further develop a weighting strategy to accommodate the scenario where information borrowing may not be equally effective for all subjects. Computation is carefully examined. Simulations demonstrate competitive performance of the proposed approach. We analyze TCGA (The Cancer Genome Atlas) LUAD (lung adenocarcinoma) data, with overall survival and gene expressions being the outcome and genetic variables, respectively. Findings different from the alternatives and with sound properties are made.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10350"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric Path-Specific Effects on a Survival Outcome Through Multiple Time-to-Event Mediators.","authors":"Yen-Tsung Huang, Ju-Sheng Hong","doi":"10.1002/sim.10327","DOIUrl":"10.1002/sim.10327","url":null,"abstract":"<p><p>A causal mediation model with multiple time-to-event mediators is exemplified by the natural course of human disease marked by sequential milestones with a time-to-event nature. For example, from hepatitis B infection to death, patients may experience intermediate events such as liver cirrhosis and liver cancer. The sequential events of hepatitis, cirrhosis, cancer, and death are susceptible to right censoring; moreover, the latter events may preclude the former events. Casting the natural course of human diseases in the framework of causal mediation modeling, we establish a model with intermediate and terminal events as the mediators and outcomes, respectively. We define the interventional analog of path-specific effects (iPSEs) as the effect of an exposure on a terminal event mediated (or not mediated) by any combination of intermediate events without parametric models. The expression of a counting process-based counterfactual hazard is derived under the sequential ignorability assumption. We employ composite nonparametric likelihood estimation to obtain maximum likelihood estimators for the counterfactual hazard and iPSEs. Our proposed estimators achieve asymptotic unbiasedness, uniform consistency, and weak convergence. Applying the proposed method, we show that hepatitis B induced mortality is mostly mediated through liver cancer and/or cirrhosis whereas hepatitis C induced mortality may be through extrahepatic diseases.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10327"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ajit C Tamhane, Dong Xi, Cyrus R Mehta, Alexander Romanenko, Jiangtao Gou
{"title":"Testing One Primary and Two Secondary Endpoints in a Two-Stage Group Sequential Trial With Extensions.","authors":"Ajit C Tamhane, Dong Xi, Cyrus R Mehta, Alexander Romanenko, Jiangtao Gou","doi":"10.1002/sim.10346","DOIUrl":"10.1002/sim.10346","url":null,"abstract":"<p><p>We study the problem of testing multiple secondary endpoints conditional on a primary endpoint being significant in a two-stage group sequential procedure, focusing on two secondary endpoints. This extends our previous work with one secondary endpoint. The test for the secondary null hypotheses is a closed procedure. Application of the Bonferroni test for testing the intersection of the secondary hypotheses results in the Holm procedure while application of the Simes test results in the Hochberg procedure. The focus of the present paper is on developing normal theory analogs of the abovementioned <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> -value based tests that take into account (i) the gatekeeping effect of the test on the primary endpoint and (ii) correlations between the endpoints. The normal theory boundaries are determined by finding the least favorable configuration of the correlations and so their knowledge is not needed to apply these procedures. The <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> -value based procedures are easy to apply but they are less powerful than their normal theory analogs because they do not take into account the correlations between the endpoints and the gatekeeping effect referred to above. On the other hand, the normal theory procedures are restricted to two secondary endpoints and two stages mainly because of computational difficulties with more than two secondary endpoints and stages. Comparisons between the two types of procedures are given in terms of secondary powers. The sensitivity of the secondary type I error rate and power to unequal information times is studied. Numerical examples and a real case study illustrate the procedures.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10346"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelin Zhong, Fernanda L Schumacher, Luis M Castro, Víctor H Lachos
{"title":"Bayesian Analysis of Censored Linear Mixed-Effects Models for Heavy-Tailed Irregularly Observed Repeated Measures.","authors":"Kelin Zhong, Fernanda L Schumacher, Luis M Castro, Víctor H Lachos","doi":"10.1002/sim.10295","DOIUrl":"10.1002/sim.10295","url":null,"abstract":"<p><p>The use of mixed-effect models to understand the evolution of the human immunodeficiency virus (HIV) and the progression of acquired immune deficiency syndrome (AIDS) has been the cornerstone of longitudinal data analysis in recent years. However, data from HIV/AIDS clinical trials have several complexities. Some of the most common recurrences are related to the situation where the HIV viral load can be undetectable, and the measures of the patient can be registered irregularly due to some problems in the data collection. Although censored mixed-effects models assuming conditionally independent normal random errors are commonly used to analyze this data type, this model may need to be more appropriate for accommodating outlying observations and responses recorded at irregular intervals. Consequently, in this paper, we propose a Bayesian analysis of censored linear mixed-effects models that replace Gaussian assumptions with a flexible class of distributions, such as the scale mixture of normal family distributions, considering a damped exponential correlation structure that was employed to account for within-subject autocorrelation among irregularly observed measures. For this complex structure, Stan's default No-U-Turn sampler is utilized to obtain posterior simulations. The feasibility of the proposed methods was demonstrated through several simulation studies and their application to two AIDS case studies.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10295"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143047832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Parametric Estimation for Semi-Competing Risks Data With Event Misascertainment.","authors":"Ruiqian Wu, Ying Zhang, Giorgos Bakoyannis","doi":"10.1002/sim.10332","DOIUrl":"10.1002/sim.10332","url":null,"abstract":"<p><p>The semi-competing risks data model is a special type of disease-state model that focuses on studying the association between an intermediate event and a terminal event and proves to be a useful tool in modeling disease progression. The study of the semi-competing risk data model not only allows us to evaluate whether a disease episode is related to death but also provides a toolkit to predict death, given that the episode occurred at a certain time. However, the computation of the semi-competing risk models is a numerically challenging task. The Gamma-Frailty conditional Markov model has been shown to be an efficient computation model for studying semi-competing risks data. Building on recent advances in studying semi-competing risks data, this work proposes a non-parametric pseudo-likelihood method equipped with an EM-like algorithm to study semi-competing risks data with event misascertainment under the restricted Gamma-Frailty conditional Markov model. A thorough simulation study is conducted to demonstrate the inference validity of the proposed method and its numerical stability. The proposed method is applied to a large HIV cohort study, EA-IeDEA, that has a severe death under-reporting issue to assess the degree of adverse impact of the interruption of ART care on HIV mortality.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10332"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758483/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Bayesian Multivariate Model With Temporal Dependence on Random Partition of Areal Data for Mosquito-Borne Diseases.","authors":"Jessica Pavani, Fernando Andrés Quintana","doi":"10.1002/sim.10325","DOIUrl":"10.1002/sim.10325","url":null,"abstract":"<p><p>More than half of the world's population is exposed to mosquito-borne diseases, leading to millions of cases and hundreds of thousands of deaths every year. Analyzing this type of data is complex and poses several interesting challenges, mainly due to the usually vast geographic area involved, the peculiar temporal behavior, and the potential correlation between infections. Motivation for this work stems from the analysis of tropical disease data, namely, the number of cases of dengue and chikungunya, for the 145 microregions in Southeast Brazil from 2018 to 2022. As a contribution to the literature on multivariate disease data, we develop a flexible Bayesian multivariate spatio-temporal model where temporal dependence is defined for areal clusters. The model features a prior distribution for the random partition of areal data that incorporates neighboring information. It also incorporates an autoregressive structure and terms related to seasonal patterns into temporal components that are disease- and cluster-specific. Furthermore, it considers a multivariate directed acyclic graph autoregressive structure to accommodate spatial and inter-disease dependence. We explore the properties of the model through simulation studies and show results that prove our proposal compares well to competing alternatives. Finally, we apply the model to the motivating dataset with a twofold goal: finding clusters of areas with similar temporal trends for some of the diseases and exploring the existence of correlation between two diseases transmitted by the same mosquito.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10325"},"PeriodicalIF":1.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}