BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf003
Ales Kotalik, David M Vock, Nancy E Sherwood, Brian P Hobbs, Joseph S Koopmeiners
{"title":"Within-trial data borrowing for sequential multiple assignment randomized trials.","authors":"Ales Kotalik, David M Vock, Nancy E Sherwood, Brian P Hobbs, Joseph S Koopmeiners","doi":"10.1093/biostatistics/kxaf003","DOIUrl":"10.1093/biostatistics/kxaf003","url":null,"abstract":"<p><p>The Sequential Multiple Assignment Randomized Trial (SMART) is a complex trial design that involves randomizing a single participant multiple times in a sequential manner. This results in the branching nature of a SMART, which represents several distinct groups defined by different combinations of treatments, response statuses, etc. A SMART can then answer various scientific questions of interest, eg, the optimal dynamic treatment regime (DTR) for treating a chronic illness, what intervention to offer first, and what intervention to offer to nonresponders (or suboptimal responders). However, the analysis of a SMART can suffer from low precision, as the potentially widely branching structure can lead to reduced sample sizes in some groups of interest. In this paper, we propose a novel analysis method for a SMART in which dynamic borrowing is used to borrow strength across groups with similar expected outcomes, thus providing increased precision for the estimation of the expected outcomes of DTRs. We apply our method to a SMART evaluating various weight loss strategies using a binary endpoint of clinically significant weight loss and show by simulation that our method can improve the precision of the estimated expected outcome of a DTR, aid in the identification of the optimal DTR, and produce a clustering analysis of DTRs embedded in a SMART.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963638/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A joint normal-ordinal (probit) model for ordinal and continuous longitudinal data.","authors":"Margaux Delporte, Geert Molenberghs, Steffen Fieuws, Geert Verbeke","doi":"10.1093/biostatistics/kxae014","DOIUrl":"10.1093/biostatistics/kxae014","url":null,"abstract":"<p><p>In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae027
Yue Wang, Haoran Shi
{"title":"Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies.","authors":"Yue Wang, Haoran Shi","doi":"10.1093/biostatistics/kxae027","DOIUrl":"10.1093/biostatistics/kxae027","url":null,"abstract":"<p><p>This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf008
Ruihan Lu, Yehua Li, Weixin Yao
{"title":"Semiparametric mixture regression for asynchronous longitudinal data using multivariate functional principal component analysis.","authors":"Ruihan Lu, Yehua Li, Weixin Yao","doi":"10.1093/biostatistics/kxaf008","DOIUrl":"10.1093/biostatistics/kxaf008","url":null,"abstract":"<p><p>The transitional phase of menopause induces significant hormonal fluctuations, exerting a profound influence on the long-term well-being of women. In an extensive longitudinal investigation of women's health during mid-life and beyond, known as the Study of Women's Health Across the Nation (SWAN), hormonal biomarkers are repeatedly assessed, following an asynchronous schedule compared to other error-prone covariates, such as physical and cardiovascular measurements. We conduct a subgroup analysis of the SWAN data employing a semiparametric mixture regression model, which allows us to explore how the relationship between hormonal responses and other time-varying or time-invariant covariates varies across subgroups. To address the challenges posed by asynchronous scheduling and measurement errors, we model the time-varying covariate trajectories as functional data with reduced-rank Karhunen-Loéve expansions, where splines are employed to capture the mean and eigenfunctions. Treating the latent subgroup membership and the functional principal component (FPC) scores as missing data, we propose an Expectation-Maximization algorithm to effectively fit the joint model, combining the mixture regression for the hormonal response and the FPC model for the asynchronous, time-varying covariates. In addition, we explore data-driven methods to determine the optimal number of subgroups within the population. Through our comprehensive analysis of the SWAN data, we unveil a crucial subgroup structure within the aging female population, shedding light on important distinctions and patterns among women undergoing menopause.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae023
Hyung G Park
{"title":"Bayesian estimation of covariate assisted principal regression for brain functional connectivity.","authors":"Hyung G Park","doi":"10.1093/biostatistics/kxae023","DOIUrl":"10.1093/biostatistics/kxae023","url":null,"abstract":"<p><p>This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae050
{"title":"Correction to: Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes.","authors":"","doi":"10.1093/biostatistics/kxae050","DOIUrl":"10.1093/biostatistics/kxae050","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae053
Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan
{"title":"Recurrent events modeling based on a reflected Brownian motion with application to hypoglycemia.","authors":"Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan","doi":"10.1093/biostatistics/kxae053","DOIUrl":"10.1093/biostatistics/kxae053","url":null,"abstract":"<p><p>Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae046
Yiqun T Chen, Lucy L Gao
{"title":"Testing for a difference in means of a single feature after clustering.","authors":"Yiqun T Chen, Lucy L Gao","doi":"10.1093/biostatistics/kxae046","DOIUrl":"10.1093/biostatistics/kxae046","url":null,"abstract":"<p><p>For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf019
Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo
{"title":"Causal functional mediation analysis with an application to functional magnetic resonance imaging data.","authors":"Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo","doi":"10.1093/biostatistics/kxaf019","DOIUrl":"10.1093/biostatistics/kxaf019","url":null,"abstract":"<p><p>A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiostatisticsPub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae020
Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng
{"title":"Model-based multifacet clustering with high-dimensional omics applications.","authors":"Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng","doi":"10.1093/biostatistics/kxae020","DOIUrl":"10.1093/biostatistics/kxae020","url":null,"abstract":"<p><p>High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}