BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad022
Erica E M Moodie, Zeyu Bian, Janie Coulombe, Yi Lian, Archer Y Yang, Susan M Shortreed
{"title":"Variable selection in high dimensions for discrete-outcome individualized treatment rules: Reducing severity of depression symptoms.","authors":"Erica E M Moodie, Zeyu Bian, Janie Coulombe, Yi Lian, Archer Y Yang, Susan M Shortreed","doi":"10.1093/biostatistics/kxad022","DOIUrl":"10.1093/biostatistics/kxad022","url":null,"abstract":"<p><p>Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10201574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad020
Jeong Hoon Jang, Changgee Chang, Amita K Manatunga, Andrew T Taylor, Qi Long
{"title":"An integrative latent class model of heterogeneous data modalities for diagnosing kidney obstruction.","authors":"Jeong Hoon Jang, Changgee Chang, Amita K Manatunga, Andrew T Taylor, Qi Long","doi":"10.1093/biostatistics/kxad020","DOIUrl":"10.1093/biostatistics/kxad020","url":null,"abstract":"<p><p>Radionuclide imaging plays a critical role in the diagnosis and management of kidney obstruction. However, most practicing radiologists in US hospitals have insufficient time and resources to acquire training and experience needed to interpret radionuclide images, leading to increased diagnostic errors. To tackle this problem, Emory University embarked on a study that aims to develop a computer-assisted diagnostic (CAD) tool for kidney obstruction by mining and analyzing patient data comprised of renogram curves, ordinal expert ratings on the obstruction status, pharmacokinetic variables, and demographic information. The major challenges here are the heterogeneity in data modes and the lack of gold standard for determining kidney obstruction. In this article, we develop a statistically principled CAD tool based on an integrative latent class model that leverages heterogeneous data modalities available for each patient to provide accurate prediction of kidney obstruction. Our integrative model consists of three sub-models (multilevel functional latent factor regression model, probit scalar-on-function regression model, and Gaussian mixture model), each of which is tailored to the specific data mode and depends on the unknown obstruction status (latent class). An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. Extensive simulations are conducted to evaluate the performance of the proposed method. An application to an Emory renal study demonstrates the usefulness of our model as a CAD tool for kidney obstruction.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10252590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad010
Albert Kuo, Kasper D Hansen, Stephanie C Hicks
{"title":"Quantification and statistical modeling of droplet-based single-nucleus RNA-sequencing data.","authors":"Albert Kuo, Kasper D Hansen, Stephanie C Hicks","doi":"10.1093/biostatistics/kxad010","DOIUrl":"10.1093/biostatistics/kxad010","url":null,"abstract":"<p><p>In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247185/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9551865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad015
Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer
{"title":"Blurring cluster randomized trials and observational studies: Two-Stage TMLE for subsampling, missingness, and few independent units.","authors":"Joshua R Nugent, Carina Marquez, Edwin D Charlebois, Rachel Abbott, Laura B Balzer","doi":"10.1093/biostatistics/kxad015","DOIUrl":"10.1093/biostatistics/kxad015","url":null,"abstract":"<p><p>Cluster randomized trials (CRTs) often enroll large numbers of participants; yet due to resource constraints, only a subset of participants may be selected for outcome assessment, and those sampled may not be representative of all cluster members. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific endpoints and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters, limiting statistical power and raising concerns about finite sample performance. Motivated by SEARCH-TB, a CRT aimed at reducing incident tuberculosis infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation to account for three sources of missingness: (i) subsampling; (ii) measurement of baseline status among those sampled; and (iii) measurement of final status among those in the incidence cohort (persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which subunits of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave like an observational study. Our application to SEARCH-TB highlights the real-world impact of different assumptions on measurement and dependence; estimates relying on unrealistic assumptions suggested the intervention increased the incidence of TB infection by 18% (risk ratio [RR]=1.18, 95% confidence interval [CI]: 0.85-1.63), while estimates accounting for the sampling scheme, missingness, and within community dependence found the intervention decreased the incident TB by 27% (RR=0.73, 95% CI: 0.57-0.92).</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10516286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad036
Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa
{"title":"Uncertainty directed factorial clinical trials.","authors":"Gopal Kotecha, Steffen Ventz, Sandra Fortini, Lorenzo Trippa","doi":"10.1093/biostatistics/kxad036","DOIUrl":"10.1093/biostatistics/kxad036","url":null,"abstract":"<p><p>The development and evaluation of novel treatment combinations is a key component of modern clinical research. The primary goals of factorial clinical trials of treatment combinations range from the estimation of intervention-specific effects, or the discovery of potential synergies, to the identification of combinations with the highest response probabilities. Most factorial studies use balanced or block randomization, with an equal number of patients assigned to each treatment combination, irrespective of the specific goals of the trial. Here, we introduce a class of Bayesian response-adaptive designs for factorial clinical trials with binary outcomes. The study design was developed using Bayesian decision-theoretic arguments and adapts the randomization probabilities to treatment combinations during the enrollment period based on the available data. Our approach enables the investigator to specify a utility function representative of the aims of the trial, and the Bayesian response-adaptive randomization algorithm aims to maximize this utility function. We considered several utility functions and factorial designs tailored to them. Then, we conducted a comparative simulation study to illustrate relevant differences of key operating characteristics across the resulting designs. We also investigated the asymptotic behavior of the proposed adaptive designs. We also used data summaries from three recent factorial trials in perioperative care, smoking cessation, and infectious disease prevention to define realistic simulation scenarios and illustrate advantages of the introduced trial designs compared to other study designs.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad030
Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis
{"title":"A Bayesian multivariate factor analysis model for causal inference using time-series observational data on mixed outcomes.","authors":"Pantelis Samartsidis, Shaun R Seaman, Abbie Harrison, Angelos Alexopoulos, Gareth J Hughes, Christopher Rawlinson, Charlotte Anderson, André Charlett, Isabel Oliver, Daniela De Angelis","doi":"10.1093/biostatistics/kxad030","DOIUrl":"10.1093/biostatistics/kxad030","url":null,"abstract":"<p><p>Assessing the impact of an intervention by using time-series observational data on multiple units and outcomes is a frequent problem in many fields of scientific research. Here, we propose a novel Bayesian multivariate factor analysis model for estimating intervention effects in such settings and develop an efficient Markov chain Monte Carlo algorithm to sample from the high-dimensional and nontractable posterior of interest. The proposed method is one of the few that can simultaneously deal with outcomes of mixed type (continuous, binomial, count), increase efficiency in the estimates of the causal effects by jointly modeling multiple outcomes affected by the intervention, and easily provide uncertainty quantification for all causal estimands of interest. Using the proposed approach, we evaluate the impact that Local Tracing Partnerships had on the effectiveness of England's Test and Trace programme for COVID-19.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138500308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad034
Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty
{"title":"Covariate-guided Bayesian mixture of spline experts for the analysis of multivariate high-density longitudinal data.","authors":"Haoyi Fu, Lu Tang, Ori Rosen, Alison E Hipwell, Theodore J Huppert, Robert T Krafty","doi":"10.1093/biostatistics/kxad034","DOIUrl":"10.1093/biostatistics/kxad034","url":null,"abstract":"<p><p>With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139032905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-07-01DOI: 10.1093/biostatistics/kxac050
Yutong Wu, Erich D Jarvis, Abhra Sarkar
{"title":"Bayesian semiparametric Markov renewal mixed models for vocalization syntax.","authors":"Yutong Wu, Erich D Jarvis, Abhra Sarkar","doi":"10.1093/biostatistics/kxac050","DOIUrl":"10.1093/biostatistics/kxac050","url":null,"abstract":"<p><p>Speech and language play an important role in human vocal communication. Studies have shown that vocal disorders can result from genetic factors. In the absence of high-quality data on humans, mouse vocalization experiments in laboratory settings have been proven useful in providing valuable insights into mammalian vocal development, including especially the impact of certain genetic mutations. Such data sets usually consist of categorical syllable sequences along with continuous intersyllable interval (ISI) times for mice of different genotypes vocalizing under different contexts. ISIs are of particular importance as increased ISIs can be an indication of possible vocal impairment. Statistical methods for properly analyzing ISIs along with the transition probabilities have however been lacking. In this article, we propose a class of novel Markov renewal mixed models that capture the stochastic dynamics of both state transitions and ISI lengths. Specifically, we model the transition dynamics and the ISIs using Dirichlet and gamma mixtures, respectively, allowing the mixture probabilities in both cases to vary flexibly with fixed covariate effects as well as random individual-specific effects. We apply our model to analyze the impact of a mutation in the Foxp2 gene on mouse vocal behavior. We find that genotypes and social contexts significantly affect the length of ISIs but, compared to previous analyses, the influences of genotype and social context on the syllable transition dynamics are weaker.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9774490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-06-25DOI: 10.1093/biostatistics/kxae021
Xiaoyue Xi, Hélène Ruffieux
{"title":"A modeling framework for detecting and leveraging node-level information in Bayesian network inference.","authors":"Xiaoyue Xi, Hélène Ruffieux","doi":"10.1093/biostatistics/kxae021","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae021","url":null,"abstract":"<p><p>Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A joint normal-ordinal (probit) model for ordinal and continuous longitudinal data.","authors":"Margaux Delporte, Geert Molenberghs, Steffen Fieuws, Geert Verbeke","doi":"10.1093/biostatistics/kxae014","DOIUrl":"10.1093/biostatistics/kxae014","url":null,"abstract":"<p><p>In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}