Biostatistics最新文献_第2页

Distributed lag interaction model with index modification. 具有索引修改的分布式滞后交互模型。

IF 1.8 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf017

Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller

{"title":"Distributed lag interaction model with index modification.","authors":"Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller","doi":"10.1093/biostatistics/kxaf017","DOIUrl":"10.1093/biostatistics/kxaf017","url":null,"abstract":"Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction model with index modification that allows for effect modification of a functional predictor by a weighted average of multiple modifiers. Our model allows for simultaneous estimation of modifier index weights and the exposure-time-response function via a spline cross-basis in a Bayesian hierarchical framework. Through simulations, we showed that our model out-performs competing methods when there are multiple modifiers of unknown importance. We applied our proposed method to a Colorado birth cohort to estimate the association between birth weight and air pollution modified by a neighborhood-vulnerability index and to a Mexican birth cohort to estimate the association between birthing-parent cardio-metabolic endpoints and air pollution modified by a birthing-parent lifetime stress index.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144369549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon. 用spoon处理空间解析转录组学数据中的均方差关系。

IF 1.8 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf012

Kinnary Shah, Boyi Guo, Stephanie C Hicks

引用次数: 0

Probabilistic clustering using shared latent variable model for assessing Alzheimer's disease biomarkers. 使用共享潜在变量模型评估阿尔茨海默病生物标志物的概率聚类。

IF 1.8 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf010

Yizhen Xu, Scott Zeger, Zheyu Wang

{"title":"Probabilistic clustering using shared latent variable model for assessing Alzheimer's disease biomarkers.","authors":"Yizhen Xu, Scott Zeger, Zheyu Wang","doi":"10.1093/biostatistics/kxaf010","DOIUrl":"10.1093/biostatistics/kxaf010","url":null,"abstract":"The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to diagnose early and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker-disease relationship. In this article, we focus on Alzheimer's disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker-disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion. We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the Biomarkers of Cognitive Decline Among Normal Individuals data, a longitudinal study that was conducted over 2 decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack Jr et al. In addition, our analysis identified 2 subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054513/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144029768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models. 利用广义双线性模型对单细胞RNA-seq进行基于模型的降维。

IF 2 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf024

Phillip B Nicol, Jeffrey W Miller

{"title":"Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.","authors":"Phillip B Nicol, Jeffrey W Miller","doi":"10.1093/biostatistics/kxaf024","DOIUrl":"10.1093/biostatistics/kxaf024","url":null,"abstract":"Dimensionality reduction is a critical step in the analysis of single-cell RNA-seq (scRNA-seq) data. The standard approach is to apply a transformation to the count matrix followed by principal components analysis (PCA). However, this approach can induce spurious heterogeneity and mask true biological variability. An alternative approach is to directly model the counts, but existing methods tend to be computationally intractable on large datasets and do not quantify uncertainty in the low-dimensional representation. To address these problems, we develop scGBM, a novel method for model-based dimensionality reduction of scRNA-seq data using a Poisson bilinear model. We introduce a fast estimation algorithm to fit the model using iteratively reweighted singular value decompositions, enabling the method to scale to datasets with millions of cells. Furthermore, scGBM quantifies the uncertainty in each cell's latent position and leverages these uncertainties to assess the confidence associated with a given cell clustering. On real and simulated single-cell data, we find that scGBM produces low-dimensional embeddings that better capture relevant biological information while removing unwanted variation.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mediation analysis with graph mediator. 使用图中介的中介分析。

IF 1.8 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf004

Yixi Xu, Yi Zhao

引用次数: 0

Functional quantile principal component analysis. 功能量化主成分分析

IF 2 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae040

Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith

{"title":"Functional quantile principal component analysis.","authors":"Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith","doi":"10.1093/biostatistics/kxae040","DOIUrl":"10.1093/biostatistics/kxae040","url":null,"abstract":"This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Regression and alignment for functional data and network topology. 功能数据和网络拓扑的回归和配准。

IF 2 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae026

Danni Tu, Julia Wrobel, Theodore D Satterthwaite, Jeff Goldsmith, Ruben C Gur, Raquel E Gur, Jan Gertheiss, Dani S Bassett, Russell T Shinohara

{"title":"Regression and alignment for functional data and network topology.","authors":"Danni Tu, Julia Wrobel, Theodore D Satterthwaite, Jeff Goldsmith, Ruben C Gur, Raquel E Gur, Jan Gertheiss, Dani S Bassett, Russell T Shinohara","doi":"10.1093/biostatistics/kxae026","DOIUrl":"10.1093/biostatistics/kxae026","url":null,"abstract":"In the brain, functional connections form a network whose topological organization can be described by graph-theoretic network diagnostics. These include characterizations of the community structure, such as modularity and participation coefficient, which have been shown to change over the course of childhood and adolescence. To investigate if such changes in the functional network are associated with changes in cognitive performance during development, network studies often rely on an arbitrary choice of preprocessing parameters, in particular the proportional threshold of network edges. Because the choice of parameter can impact the value of the network diagnostic, and therefore downstream conclusions, we propose to circumvent that choice by conceptualizing the network diagnostic as a function of the parameter. As opposed to a single value, a network diagnostic curve describes the connectome topology at multiple scales-from the sparsest group of the strongest edges to the entire edge set. To relate these curves to executive function and other covariates, we use scalar-on-function regression, which is more flexible than previous functional data-based models used in network neuroscience. We then consider how systematic differences between networks can manifest in misalignment of diagnostic curves, and consequently propose a supervised curve alignment method that incorporates auxiliary information from other variables. Our algorithm performs both functional regression and alignment via an iterative, penalized, and nonlinear likelihood optimization. The illustrated method has the potential to improve the interpretability and generalizability of neuroscience studies where the goal is to study heterogeneity among a mixture of function- and scalar-valued measures.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease. 多视图数据集成与预测的可扩展随机核方法及其在冠状病毒病中的应用。

IF 1.8 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf001

Sandra E Safo, Han Lu

{"title":"Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease.","authors":"Sandra E Safo, Han Lu","doi":"10.1093/biostatistics/kxaf001","DOIUrl":"10.1093/biostatistics/kxaf001","url":null,"abstract":"There is still more to learn about the pathobiology of coronavirus disease (COVID-19) despite 4 years of the pandemic. A multiomics approach offers a comprehensive view of the disease and has the potential to yield deeper insight into the pathogenesis of the disease. Previous multiomics integrative analysis and prediction studies for COVID-19 severity and status have assumed simple relationships (ie linear relationships) between omics data and between omics and COVID-19 outcomes. However, these linear methods do not account for the inherent underlying nonlinear structure associated with these different types of data. The motivation behind this work is to model nonlinear relationships in multiomics and COVID-19 outcomes, and to determine key multidimensional molecules associated with the disease. Toward this goal, we develop scalable randomized kernel methods for jointly associating data from multiple sources or views and simultaneously predicting an outcome or classifying a unit into one of 2 or more classes. We also determine variables or groups of variables that best contribute to the relationships among the views. We use the idea that random Fourier bases can approximate shift-invariant kernel functions to construct nonlinear mappings of each view and we use these mappings and the outcome variable to learn view-independent low-dimensional representations. We demonstrate the effectiveness of the proposed methods through extensive simulations. When the proposed methods were applied to gene expression, metabolomics, proteomics, and lipidomics data pertaining to COVID-19, we identified several molecular signatures for COVID-19 status and severity. Our results agree with previous findings and suggest potential avenues for future research. Our algorithms are implemented in Pytorch and interfaced in R and available at: https://github.com/lasandrall/RandMVLearn.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11839864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143460884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mediation with External Summary Statistic Information. 带有外部汇总统计信息的中介。

IF 2 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf020

Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee

{"title":"Mediation with External Summary Statistic Information.","authors":"Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee","doi":"10.1093/biostatistics/kxaf020","DOIUrl":"10.1093/biostatistics/kxaf020","url":null,"abstract":"Environmental health studies are increasingly measuring endogenous omics data ($ boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ boldsymbol{A} $) affects a health outcome ($ boldsymbol{Y} $), given confounders ($ boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ boldsymbol{A} $ on $ boldsymbol{Y} $ given $ boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ boldsymbol{Y}midboldsymbol{M},boldsymbol{A},boldsymbol{C} $) and total effect ($ boldsymbol{Y}midboldsymbol{A},boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144735537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semiparametric efficient estimation of small genetic effects in large-scale population cohorts. 大规模群体群体中小遗传效应的半参数有效估计。

IF 2 3区数学

Biostatistics Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf030

Olivier Labayle, Breeshey Roskams-Hieter, Joshua Slaughter, Kelsey Tetley-Campbell, Mark J van der Laan, Chris P Ponting, Sjoerd V Beentjes, Ava Khamseh

{"title":"Semiparametric efficient estimation of small genetic effects in large-scale population cohorts.","authors":"Olivier Labayle, Breeshey Roskams-Hieter, Joshua Slaughter, Kelsey Tetley-Campbell, Mark J van der Laan, Chris P Ponting, Sjoerd V Beentjes, Ava Khamseh","doi":"10.1093/biostatistics/kxaf030","DOIUrl":"10.1093/biostatistics/kxaf030","url":null,"abstract":"Population genetics seeks to quantify DNA variant associations with traits or diseases, as well as interactions among variants and with environmental factors. Computing millions of estimates in large cohorts in which small effect sizes and tight confidence intervals are expected, necessitates minimizing model-misspecification bias to increase power and control false discoveries. We present TarGene, a unified statistical workflow for the semi-parametric efficient and double robust estimation of genetic effects including $ k $-point interactions among categorical variables in the presence of confounding and weak population dependence. $ k $-point interactions, or Average Interaction Effects (AIEs), are a direct generalization of the usual average treatment effect (ATE). We estimate genetic effects with cross-validated and/or weighted versions of Targeted Minimum Loss-based Estimators (TMLE) and One-Step Estimators (OSE). The effect of dependence among data units on variance estimates is corrected by using sieve plateau variance estimators based on genetic relatedness across the units. We present extensive realistic simulations to demonstrate power, coverage, and control of type I error. Our motivating application is the targeted estimation of genetic effects on trait, including two-point and higher-order gene-gene and gene-environment interactions, in large-scale genomic databases such as UK Biobank and All of Us. All cross-validated and/or weighted TMLE and OSE for the AIE $ k $-point interaction, as well as ATEs, conditional ATEs and functions thereof, are implemented in the general purpose Julia package TMLE.jl. For high-throughput applications in population genomics, we provide the open-source Nextflow pipeline and software TarGene which integrates seamlessly with modern high-performance and cloud computing platforms.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12479317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0