BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad015
Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer
{"title":"Blurring cluster randomized trials and observational studies: Two-Stage TMLE for subsampling, missingness, and few independent units.","authors":"Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer","doi":"10.1093/biostatistics/kxad015","DOIUrl":"10.1093/biostatistics/kxad015","url":null,"abstract":"<p><p>Cluster randomized trials (CRTs) often enroll large numbers of participants; yet due to resource constraints, only a subset of participants may be selected for outcome assessment, and those sampled may not be representative of all cluster members. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific endpoints and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters, limiting statistical power and raising concerns about finite sample performance. Motivated by SEARCH-TB, a CRT aimed at reducing incident tuberculosis infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation to account for three sources of missingness: (i) subsampling; (ii) measurement of baseline status among those sampled; and (iii) measurement of final status among those in the incidence cohort (persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which subunits of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave like an observational study. Our application to SEARCH-TB highlights the real-world impact of different assumptions on measurement and dependence; estimates relying on unrealistic assumptions suggested the intervention increased the incidence of TB infection by 18% (risk ratio [RR]=1.18, 95% confidence interval [CI]: 0.85-1.63), while estimates accounting for the sampling scheme, missingness, and within community dependence found the intervention decreased the incident TB by 27% (RR=0.73, 95% CI: 0.57-0.92).</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"599-616"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10516286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad036
Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa
{"title":"Uncertainty directed factorial clinical trials.","authors":"Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa","doi":"10.1093/biostatistics/kxad036","DOIUrl":"10.1093/biostatistics/kxad036","url":null,"abstract":"<p><p>The development and evaluation of novel treatment combinations is a key component of modern clinical research. The primary goals of factorial clinical trials of treatment combinations range from the estimation of intervention-specific effects, or the discovery of potential synergies, to the identification of combinations with the highest response probabilities. Most factorial studies use balanced or block randomization, with an equal number of patients assigned to each treatment combination, irrespective of the specific goals of the trial. Here, we introduce a class of Bayesian response-adaptive designs for factorial clinical trials with binary outcomes. The study design was developed using Bayesian decision-theoretic arguments and adapts the randomization probabilities to treatment combinations during the enrollment period based on the available data. Our approach enables the investigator to specify a utility function representative of the aims of the trial, and the Bayesian response-adaptive randomization algorithm aims to maximize this utility function. We considered several utility functions and factorial designs tailored to them. Then, we conducted a comparative simulation study to illustrate relevant differences of key operating characteristics across the resulting designs. We also investigated the asymptotic behavior of the proposed adaptive designs. We also used data summaries from three recent factorial trials in perioperative care, smoking cessation, and infectious disease prevention to define realistic simulation scenarios and illustrate advantages of the introduced trial designs compared to other study designs.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"833-851"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad030
Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis
{"title":"A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes.","authors":"Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis","doi":"10.1093/biostatistics/kxad030","DOIUrl":"10.1093/biostatistics/kxad030","url":null,"abstract":"<p><p>Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"867-884"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138500308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad034
Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty
{"title":"Covariate-guided Bayesian mixture of spline experts for the analysis of multivariate high-density longitudinal data.","authors":"Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty","doi":"10.1093/biostatistics/kxad034","DOIUrl":"10.1093/biostatistics/kxad034","url":null,"abstract":"<p><p>With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"666-680"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139032905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxac050
Yutong Wu, Erich D Jarvis, Abhra Sarkar
{"title":"Bayesian semiparametric Markov renewal mixed models for vocalization syntax.","authors":"Yutong Wu, Erich D Jarvis, Abhra Sarkar","doi":"10.1093/biostatistics/kxac050","DOIUrl":"10.1093/biostatistics/kxac050","url":null,"abstract":"<p><p>Speech and language play an important role in human vocal communication. Studies have shown that vocal disorders can result from genetic factors. In the absence of high-quality data on humans, mouse vocalization experiments in laboratory settings have been proven useful in providing valuable insights into mammalian vocal development, including especially the impact of certain genetic mutations. Such data sets usually consist of categorical syllable sequences along with continuous intersyllable interval (ISI) times for mice of different genotypes vocalizing under different contexts. ISIs are of particular importance as increased ISIs can be an indication of possible vocal impairment. Statistical methods for properly analyzing ISIs along with the transition probabilities have however been lacking. In this article, we propose a class of novel Markov renewal mixed models that capture the stochastic dynamics of both state transitions and ISI lengths. Specifically, we model the transition dynamics and the ISIs using Dirichlet and gamma mixtures, respectively, allowing the mixture probabilities in both cases to vary flexibly with fixed covariate effects as well as random individual-specific effects. We apply our model to analyze the impact of a mutation in the Foxp2 gene on mouse vocal behavior. We find that genotypes and social contexts significantly affect the length of ISIs but, compared to previous analyses, the influences of genotype and social context on the syllable transition dynamics are weaker.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"648-665"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9774490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-26DOI: 10.1093/biostatistics/kxae009
Seoyoon Cho, Matthew A Psioda, Joseph G Ibrahim
{"title":"Bayesian joint modeling of multivariate longitudinal and survival outcomes using Gaussian copulas","authors":"Seoyoon Cho, Matthew A Psioda, Joseph G Ibrahim","doi":"10.1093/biostatistics/kxae009","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae009","url":null,"abstract":"There is an increasing interest in the use of joint models for the analysis of longitudinal and survival data. While random effects models have been extensively studied, these models can be hard to implement and the fixed effect regression parameters must be interpreted conditional on the random effects. Copulas provide a useful alternative framework for joint modeling. One advantage of using copulas is that practitioners can directly specify marginal models for the outcomes of interest. We develop a joint model using a Gaussian copula to characterize the association between multivariate longitudinal and survival outcomes. Rather than using an unstructured correlation matrix in the copula model to characterize dependence structure as is common, we propose a novel decomposition that allows practitioners to impose structure (e.g., auto-regressive) which provides efficiency gains in small to moderate sample sizes and reduces computational complexity. We develop a Markov chain Monte Carlo model fitting procedure for estimation. We illustrate the method’s value using a simulation study and present a real data analysis of longitudinal quality of life and disease-free survival data from an International Breast Cancer Study Group trial.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"29 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-23DOI: 10.1093/biostatistics/kxae010
Timothy Barry, Kathryn Roeder, Eugene Katsevich
{"title":"Exponential family measurement error models for single-cell CRISPR screens","authors":"Timothy Barry, Kathryn Roeder, Eugene Katsevich","doi":"10.1093/biostatistics/kxae010","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae010","url":null,"abstract":"Summary CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens—“thresholded regression”—exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV (“GLM-based errors-in-variables”), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"85 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad001
Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi
{"title":"Estimating the overall fraction of phenotypic variance attributed to high-dimensional predictors measured with error.","authors":"Soutrik Mandal, Do Hyun Kim, Xing Hua, Shilan Li, Jianxin Shi","doi":"10.1093/biostatistics/kxad001","DOIUrl":"10.1093/biostatistics/kxad001","url":null,"abstract":"<p><p>In prospective genomic studies (e.g., DNA methylation, metagenomics, and transcriptomics), it is crucial to estimate the overall fraction of phenotypic variance (OFPV) attributed to the high-dimensional genomic variables, a concept similar to heritability analyses in genome-wide association studies (GWAS). Unlike genetic variants in GWAS, these genomic variables are typically measured with error due to technical limitation and temporal instability. While the existing methods developed for GWAS can be used, ignoring measurement error may severely underestimate OFPV and mislead the design of future studies. Assuming that measurement error variances are distributed similarly between causal and noncausal variables, we show that the asymptotic attenuation factor equals to the average intraclass correlation coefficients of all genomic variables, which can be estimated based on a pilot study with repeated measurements. We illustrate the method by estimating the contribution of microbiome taxa to body mass index and multiple allergy traits in the American Gut Project. Finally, we show that measurement error does not cause meaningful bias when estimating the correlation of effect sizes for two traits.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"486-503"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10728987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad014
Jiabei Yang, Ann W Mwangi, Rami Kantor, Issa J Dahabreh, Monicah Nyambura, Allison Delong, Joseph W Hogan, Jon A Steingrimsson
{"title":"Tree-based subgroup discovery using electronic health record data: heterogeneity of treatment effects for DTG-containing therapies.","authors":"Jiabei Yang, Ann W Mwangi, Rami Kantor, Issa J Dahabreh, Monicah Nyambura, Allison Delong, Joseph W Hogan, Jon A Steingrimsson","doi":"10.1093/biostatistics/kxad014","DOIUrl":"10.1093/biostatistics/kxad014","url":null,"abstract":"<p><p>The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"323-335"},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10204527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad016
Tracy Q Dong, Elizabeth R Brown
{"title":"A joint Bayesian hierarchical model for estimating SARS-CoV-2 genomic and subgenomic RNA viral dynamics and seroconversion.","authors":"Tracy Q Dong, Elizabeth R Brown","doi":"10.1093/biostatistics/kxad016","DOIUrl":"10.1093/biostatistics/kxad016","url":null,"abstract":"<p><p>Understanding the viral dynamics of and natural immunity to the severe acute respiratory syndrome coronavirus 2 is crucial for devising better therapeutic and prevention strategies for coronavirus disease 2019 (COVID-19). Here, we present a Bayesian hierarchical model that jointly estimates the genomic RNA viral load, the subgenomic RNA (sgRNA) viral load (correlated to active viral replication), and the rate and timing of seroconversion (correlated to presence of antibodies). Our proposed method accounts for the dynamical relationship and correlation structure between the two types of viral load, allows for borrowing of information between viral load and antibody data, and identifies potential correlates of viral load characteristics and propensity for seroconversion. We demonstrate the features of the joint model through application to the COVID-19 post-exposure prophylaxis study and conduct a cross-validation exercise to illustrate the model's ability to impute the sgRNA viral trajectories for people who only had genomic RNA viral load data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"336-353"},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10247403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}