BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad017
{"title":"Correction to: A transformation perspective on marginal and conditional models.","authors":"","doi":"10.1093/biostatistics/kxad017","DOIUrl":"10.1093/biostatistics/kxad017","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10301897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad007
Shuo Chen, Yuan Zhang, Qiong Wu, Chuan Bi, Peter Kochunov, L Elliot Hong
{"title":"Identifying covariate-related subnetworks for whole-brain connectome analysis.","authors":"Shuo Chen, Yuan Zhang, Qiong Wu, Chuan Bi, Peter Kochunov, L Elliot Hong","doi":"10.1093/biostatistics/kxad007","DOIUrl":"10.1093/biostatistics/kxad007","url":null,"abstract":"<p><p>Whole-brain connectome data characterize the connections among distributed neural populations as a set of edges in a large network, and neuroscience research aims to systematically investigate associations between brain connectome and clinical or experimental conditions as covariates. A covariate is often related to a number of edges connecting multiple brain areas in an organized structure. However, in practice, neither the covariate-related edges nor the structure is known. Therefore, the understanding of underlying neural mechanisms relies on statistical methods that are capable of simultaneously identifying covariate-related connections and recognizing their network topological structures. The task can be challenging because of false-positive noise and almost infinite possibilities of edges combining into subnetworks. To address these challenges, we propose a new statistical approach to handle multivariate edge variables as outcomes and output covariate-related subnetworks. We first study the graph properties of covariate-related subnetworks from a graph and combinatorics perspective and accordingly bridge the inference for individual connectome edges and covariate-related subnetworks. Next, we develop efficient algorithms to exact covariate-related subnetworks from the whole-brain connectome data with an $ell_0$ norm penalty. We validate the proposed methods based on an extensive simulation study, and we benchmark our performance against existing methods. Using our proposed method, we analyze two separate resting-state functional magnetic resonance imaging data sets for schizophrenia research and obtain highly replicable disease-related subnetworks.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxac048
Luisa Barbanti, Torsten Hothorn
{"title":"A transformation perspective on marginal and conditional models.","authors":"Luisa Barbanti, Torsten Hothorn","doi":"10.1093/biostatistics/kxac048","DOIUrl":"10.1093/biostatistics/kxac048","url":null,"abstract":"<p><p>Clustered observations are ubiquitous in controlled and observational studies and arise naturally in multicenter trials or longitudinal surveys. We present a novel model for the analysis of clustered observations where the marginal distributions are described by a linear transformation model and the correlations by a joint multivariate normal distribution. The joint model provides an analytic formula for the marginal distribution. Owing to the richness of transformation models, the techniques are applicable to any type of response variable, including bounded, skewed, binary, ordinal, or survival responses. We demonstrate how the common normal assumption for reaction times can be relaxed in the sleep deprivation benchmark data set and report marginal odds ratios for the notoriously difficult toe nail data. We furthermore discuss the analysis of two clinical trials aiming at the estimation of marginal treatment effects. In the first trial, pain was repeatedly assessed on a bounded visual analog scale and marginal proportional-odds models are presented. The second trial reported disease-free survival in rectal cancer patients, where the marginal hazard ratio from Weibull and Cox models is of special interest. An empirical evaluation compares the performance of the novel approach to general estimation equations for binary responses and to conditional mixed-effects models for continuous responses. An implementation is available in the tram add-on package to the R system and was benchmarked against established models in the literature.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad011
Yuanzhi Yu, Roderick J Little, Matthew Perzanowski, Qixuan Chen
{"title":"Multiple imputation of more than one environmental exposure with nondifferential measurement error.","authors":"Yuanzhi Yu, Roderick J Little, Matthew Perzanowski, Qixuan Chen","doi":"10.1093/biostatistics/kxad011","DOIUrl":"10.1093/biostatistics/kxad011","url":null,"abstract":"<p><p>Measurement error is common in environmental epidemiologic studies, but methods for correcting measurement error in regression models with multiple environmental exposures as covariates have not been well investigated. We consider a multiple imputation approach, combining external or internal calibration samples that contain information on both true and error-prone exposures with the main study data of multiple exposures measured with error. We propose a constrained chained equations multiple imputation (CEMI) algorithm that places constraints on the imputation model parameters in the chained equations imputation based on the assumptions of strong nondifferential measurement error. We also extend the constrained CEMI method to accommodate nondetects in the error-prone exposures in the main study data. We estimate the variance of the regression coefficients using the bootstrap with two imputations of each bootstrapped sample. The constrained CEMI method is shown by simulations to outperform existing methods, namely the method that ignores measurement error, classical calibration, and regression prediction, yielding estimated regression coefficients with smaller bias and confidence intervals with coverage close to the nominal level. We apply the proposed method to the Neighborhood Asthma and Allergy Study to investigate the associations between the concentrations of multiple indoor allergens and the fractional exhaled nitric oxide level among asthmatic children in New York City. The constrained CEMI method can be implemented by imposing constraints on the imputation matrix using the mice and bootImpute packages in R.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9522828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxac052
Harrison T Reeder, Kyu Ha Lee, Sebastien Haneuse
{"title":"Characterizing quantile-varying covariate effects under the accelerated failure time model.","authors":"Harrison T Reeder, Kyu Ha Lee, Sebastien Haneuse","doi":"10.1093/biostatistics/kxac052","DOIUrl":"10.1093/biostatistics/kxac052","url":null,"abstract":"<p><p>An important task in survival analysis is choosing a structure for the relationship between covariates of interest and the time-to-event outcome. For example, the accelerated failure time (AFT) model structures each covariate effect as a constant multiplicative shift in the outcome distribution across all survival quantiles. Though parsimonious, this structure cannot detect or capture effects that differ across quantiles of the distribution, a limitation that is analogous to only permitting proportional hazards in the Cox model. To address this, we propose a general framework for quantile-varying multiplicative effects under the AFT model. Specifically, we embed flexible regression structures within the AFT model and derive a novel formula for interpretable effects on the quantile scale. A regression standardization scheme based on the g-formula is proposed to enable the estimation of both covariate-conditional and marginal effects for an exposure of interest. We implement a user-friendly Bayesian approach for the estimation and quantification of uncertainty while accounting for left truncation and complex censoring. We emphasize the intuitive interpretation of this model through numerical and graphical tools and illustrate its performance through simulation and application to a study of Alzheimer's disease and dementia.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10513263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad005
Yannick Vandendijck, Oswaldo Gressani, Christel Faes, Carlo G Camarda, Niel Hens
{"title":"Cohort-based smoothing methods for age-specific contact rates.","authors":"Yannick Vandendijck, Oswaldo Gressani, Christel Faes, Carlo G Camarda, Niel Hens","doi":"10.1093/biostatistics/kxad005","DOIUrl":"10.1093/biostatistics/kxad005","url":null,"abstract":"<p><p>The use of social contact rates is widespread in infectious disease modeling since it has been shown that they are key driving forces of important epidemiological parameters. Quantification of contact patterns is crucial to parameterize dynamic transmission models and to provide insights on the (basic) reproduction number. Information on social interactions can be obtained from population-based contact surveys, such as the European Commission project POLYMOD. Estimation of age-specific contact rates from these studies is often done using a piecewise constant approach or bivariate smoothing techniques. For the latter, typically, smoothness is introduced in the dimensions of the respondent's and contact's age (i.e., the rows and columns of the social contact matrix). We propose a smoothing constrained approach-taking into account the reciprocal nature of contacts-introducing smoothness over the diagonal (including all subdiagonals) of the social contact matrix. This modeling approach is justified assuming that when people age their contact behavior changes smoothly. We call this smoothing from a cohort perspective. Two approaches that allow for smoothing over social contact matrix diagonals are proposed, namely (i) reordering of the diagonal components of the contact matrix and (ii) reordering of the penalty matrix ensuring smoothness over the contact matrix diagonals. Parameter estimation is done in the likelihood framework by using constrained penalized iterative reweighted least squares. A simulation study underlines the benefits of cohort-based smoothing. Finally, the proposed methods are illustrated on the Belgian POLYMOD data of 2006. Code to reproduce the results of the article can be downloaded on this GitHub repository https://github.com/oswaldogressani/Cohort_smoothing.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9141117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad006
Jon A Steingrimsson, David H Barker, Ruofan Bie, Issa J Dahabreh
{"title":"Systematically missing data in causally interpretable meta-analysis.","authors":"Jon A Steingrimsson, David H Barker, Ruofan Bie, Issa J Dahabreh","doi":"10.1093/biostatistics/kxad006","DOIUrl":"10.1093/biostatistics/kxad006","url":null,"abstract":"<p><p>Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9567977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.","authors":"Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan","doi":"10.1093/biostatistics/kxac051","DOIUrl":"10.1093/biostatistics/kxac051","url":null,"abstract":"<p><p>Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017120/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10861888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad003
Justin J Slater, Aiyush Bansal, Harlan Campbell, Jeffrey S Rosenthal, Paul Gustafson, Patrick E Brown
{"title":"A Bayesian approach to estimating COVID-19 incidence and infection fatality rates.","authors":"Justin J Slater, Aiyush Bansal, Harlan Campbell, Jeffrey S Rosenthal, Paul Gustafson, Patrick E Brown","doi":"10.1093/biostatistics/kxad003","DOIUrl":"10.1093/biostatistics/kxad003","url":null,"abstract":"<p><p>Naive estimates of incidence and infection fatality rates (IFR) of coronavirus disease 2019 suffer from a variety of biases, many of which relate to preferential testing. This has motivated epidemiologists from around the globe to conduct serosurveys that measure the immunity of individuals by testing for the presence of SARS-CoV-2 antibodies in the blood. These quantitative measures (titer values) are then used as a proxy for previous or current infection. However, statistical methods that use this data to its full potential have yet to be developed. Previous researchers have discretized these continuous values, discarding potentially useful information. In this article, we demonstrate how multivariate mixture models can be used in combination with post-stratification to estimate cumulative incidence and IFR in an approximate Bayesian framework without discretization. In doing so, we account for uncertainty from both the estimated number of infections and incomplete deaths data to provide estimates of IFR. This method is demonstrated using data from the Action to Beat Coronavirus erosurvey in Canada.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10850020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-04-15DOI: 10.1093/biostatistics/kxad008
Amber M Young, Scott Van Buren, Naim U Rashid
{"title":"Differential transcript usage analysis incorporating quantification uncertainty via compositional measurement error regression modeling.","authors":"Amber M Young, Scott Van Buren, Naim U Rashid","doi":"10.1093/biostatistics/kxad008","DOIUrl":"10.1093/biostatistics/kxad008","url":null,"abstract":"<p><p>Differential transcript usage (DTU) occurs when the relative expression of multiple transcripts arising from the same gene changes between different conditions. Existing approaches to detect DTU often rely on computational procedures that can have speed and scalability issues as the number of samples increases. Here we propose a new method, CompDTU, that uses compositional regression to model the relative abundance proportions of each transcript that are of interest in DTU analyses. This procedure leverages fast matrix-based computations that make it ideally suited for DTU analysis with larger sample sizes. This method also allows for the testing of and adjustment for multiple categorical or continuous covariates. Additionally, many existing approaches for DTU ignore quantification uncertainty in the expression estimates for each transcript in RNA-seq data. We extend our CompDTU method to incorporate quantification uncertainty leveraging common output from RNA-seq expression quantification tool in a novel method CompDTUme. Through several power analyses, we show that CompDTU has excellent sensitivity and reduces false positive results relative to existing methods. Additionally, CompDTUme results in further improvements in performance over CompDTU with sufficient sample size for genes with high levels of quantification uncertainty, while also maintaining favorable speed and scalability. We motivate our methods using data from the Cancer Genome Atlas Breast Invasive Carcinoma data set, specifically using RNA-seq data from primary tumors for 740 patients with breast cancer. We show greatly reduced computation time from our new methods as well as the ability to detect several novel genes with significant DTU across different breast cancer subtypes.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9683536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}