BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae127
Yidong Zhou, Hans-Georg Müller
{"title":"Wasserstein regression with empirical measures and density estimation for sparse data.","authors":"Yidong Zhou, Hans-Georg Müller","doi":"10.1093/biomtc/ujae127","DOIUrl":"https://doi.org/10.1093/biomtc/ujae127","url":null,"abstract":"<p><p>The problem of modeling the relationship between univariate distributions and one or more explanatory variables lately has found increasing interest. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with currently available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for each distribution varies and especially when it is small for some of the distributions but large for others. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations and Environmental Influences on Child Health Outcomes data.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae119
Eva Biswas, Andee Kaplan, Mark S Kaiser, Daniel J Nordman
{"title":"A formal goodness-of-fit test for spatial binary Markov random field models.","authors":"Eva Biswas, Andee Kaplan, Mark S Kaiser, Daniel J Nordman","doi":"10.1093/biomtc/ujae119","DOIUrl":"https://doi.org/10.1093/biomtc/ujae119","url":null,"abstract":"<p><p>Binary spatial observations arise in environmental and ecological studies, where Markov random field (MRF) models are often applied. Despite the prevalence and the long history of MRF models for spatial binary data, appropriate model diagnostics have remained an unresolved issue in practice. A complicating factor is that such models involve neighborhood specifications, which are difficult to assess for binary data. To address this, we propose a formal goodness-of-fit (GOF) test for diagnosing an MRF model for spatial binary values. The test statistic involves a type of conditional Moran's I based on the fitted conditional probabilities, which can detect departures in model form, including neighborhood structure. Numerical studies show that the GOF test can perform well in detecting deviations from a null model, with a focus on neighborhoods as a difficult issue. We illustrate the spatial test with an application to Besag's historical endive data as well as the breeding pattern of grasshopper sparrows across Iowa.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142494172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae117
Samuel Perreault, Gracia Y Dong, Alex Stringer, Hwashin Shin, Patrick E Brown
{"title":"Case-crossover designs and overdispersion with application to air pollution epidemiology.","authors":"Samuel Perreault, Gracia Y Dong, Alex Stringer, Hwashin Shin, Patrick E Brown","doi":"10.1093/biomtc/ujae117","DOIUrl":"https://doi.org/10.1093/biomtc/ujae117","url":null,"abstract":"<p><p>Over the last three decades, case-crossover designs have found many applications in health sciences, especially in air pollution epidemiology. They are typically used, in combination with partial likelihood techniques, to define a conditional logistic model for the responses, usually health outcomes, conditional on the exposures. Despite the fact that conditional logistic models have been shown equivalent, in typical air pollution epidemiology setups, to specific instances of the well-known Poisson time series model, it is often claimed that they cannot allow for overdispersion. This paper clarifies the relationship between case-crossover designs, the models that ensue from their use, and overdispersion. In particular, we propose to relax the assumption of independence between individuals traditionally made in case-crossover analyses, in order to explicitly introduce overdispersion in the conditional logistic model. As we show, the resulting overdispersed conditional logistic model coincides with the overdispersed, conditional Poisson model, in the sense that their likelihoods are simple re-expressions of one another. We further provide the technical details of a Bayesian implementation of the proposed case-crossover model, which we use to demonstrate, by means of a large simulation study, that standard case-crossover models can lead to dramatically underestimated coverage probabilities, while the proposed models do not. We also perform an illustrative analysis of the association between air pollution and morbidity in Toronto, Canada, which shows that the proposed models are more robust than standard ones to outliers such as those associated with public holidays.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142457171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae130
Xingche Guo, Bin Yang, Ji Meng Loh, Qinxia Wang, Yuanjia Wang
{"title":"A hierarchical random effects state-space model for modeling brain activities from electroencephalogram data.","authors":"Xingche Guo, Bin Yang, Ji Meng Loh, Qinxia Wang, Yuanjia Wang","doi":"10.1093/biomtc/ujae130","DOIUrl":"10.1093/biomtc/ujae130","url":null,"abstract":"<p><p>Mental disorders present challenges in diagnosis and treatment due to their complex and heterogeneous nature. Electroencephalogram (EEG) has shown promise as a source of potential biomarkers for these disorders. However, existing methods for analyzing EEG signals have limitations in addressing heterogeneity and capturing complex brain activity patterns between regions. This paper proposes a novel random effects state-space model (RESSM) for analyzing large-scale multi-channel resting-state EEG signals, accounting for the heterogeneity of brain connectivities between groups and individual subjects. We incorporate multi-level random effects for temporal dynamical and spatial mapping matrices and address non-stationarity so that the brain connectivity patterns can vary over time. The model is fitted under a Bayesian hierarchical model framework coupled with a Gibbs sampler. Compared to previous mixed-effects state-space models, we directly model high-dimensional random effects matrices of interest without structural constraints and tackle the challenge of identifiability. Through extensive simulation studies, we demonstrate that our approach yields valid estimation and inference. We apply RESSM to a multi-site clinical trial of major depressive disorder (MDD). Our analysis uncovers significant differences in resting-state brain temporal dynamics among MDD patients compared to healthy individuals. In addition, we show the subject-level EEG features derived from RESSM exhibit a superior predictive value for the heterogeneous treatment effect compared to the EEG frequency band power, suggesting the potential of EEG as a valuable biomarker for MDD.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11540184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An exploratory penalized regression to identify combined effects of temporal variables-application to agri-environmental issues.","authors":"Bénedicte Fontez, Patrice Loisel, Thierry Simonneau, Nadine Hilgert","doi":"10.1093/biomtc/ujae134","DOIUrl":"https://doi.org/10.1093/biomtc/ujae134","url":null,"abstract":"<p><p>The development of sensors is opening new avenues in several fields of activity. Concerning agricultural crops, complex combinations of agri-environmental dynamics, such as soil and climate variables, are now commonly recorded. These new kinds of measurements are an opportunity to improve knowledge of the drivers of crop yield and crop quality at harvest. This involves renewing statistical approaches to account for the combined variations of these dynamic variables, here considered as temporal variables. The objective of the paper is to estimate an interpretable model to study the influence of the two combined inputs on a scalar output. A Sparse and Structured Procedure is proposed to Identify Combined Effects of Formatted temporal Predictors, hereafter denoted S piceFP. The method is based on the transformation of both temporal variables into categorical variables by defining joint modalities, from which a collection of multiple regression models is then derived. The regressors are the frequencies associated with joint class intervals. The class intervals and related regression coefficients are determined using a generalized fused lasso. S piceFP is a generic and exploratory approach. The simulations we performed show that it is flexible enough to select the non-null or influential modalities of values. A motivating example for grape quality is presented.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142692652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae152
Peter Norwood, Marie Davidian, Eric Laber
{"title":"Adaptive randomization methods for sequential multiple assignment randomized trials (smarts) via thompson sampling.","authors":"Peter Norwood, Marie Davidian, Eric Laber","doi":"10.1093/biomtc/ujae152","DOIUrl":"10.1093/biomtc/ujae152","url":null,"abstract":"<p><p>Response-adaptive randomization (RAR) has been studied extensively in conventional, single-stage clinical trials, where it has been shown to yield ethical and statistical benefits, especially in trials with many treatment arms. However, RAR and its potential benefits are understudied in sequential multiple assignment randomized trials (SMARTs), which are the gold-standard trial design for evaluation of multi-stage treatment regimes. We propose a suite of RAR algorithms for SMARTs based on Thompson Sampling (TS), a widely used RAR method in single-stage trials in which treatment randomization probabilities are aligned with the estimated probability that the treatment is optimal. We focus on two common objectives in SMARTs: (1) comparison of the regimes embedded in the trial and (2) estimation of an optimal embedded regime. We develop valid post-study inferential procedures for treatment regimes under the proposed algorithms. This is nontrivial, as even in single-stage settings standard estimators of an average treatment effect can have nonnormal asymptotic behavior under RAR. Our algorithms are the first for RAR in multi-stage trials that account for non-standard limiting behavior due to RAR. Empirical studies based on real-world SMARTs show that TS can improve in-trial subject outcomes without sacrificing efficiency for post-trial comparisons.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae149
Van Tuan Nguyen, Adeline Fermanian, Antoine Barbieri, Sarah Zohar, Anne-Sophie Jannot, Simon Bussy, Agathe Guilloux
{"title":"An efficient joint model for high dimensional longitudinal and survival data via generic association features.","authors":"Van Tuan Nguyen, Adeline Fermanian, Antoine Barbieri, Sarah Zohar, Anne-Sophie Jannot, Simon Bussy, Agathe Guilloux","doi":"10.1093/biomtc/ujae149","DOIUrl":"https://doi.org/10.1093/biomtc/ujae149","url":null,"abstract":"<p><p>This paper introduces a prognostic method called FLASH that addresses the problem of joint modeling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularization techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalized medicine or churn prediction. We develop an estimation methodology based on the expectation-maximization algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available medical datasets. Our method significantly outperforms the state-of-the-art joint models in terms of C-index in a so-called \"real-time\" prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical point of view, making it interpretable, which is of the greatest importance for a prognostic algorithm in healthcare.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae153
Huali Zhao, Tianying Wang
{"title":"Debiased high-dimensional regression calibration for errors-in-variables log-contrast models.","authors":"Huali Zhao, Tianying Wang","doi":"10.1093/biomtc/ujae153","DOIUrl":"https://doi.org/10.1093/biomtc/ujae153","url":null,"abstract":"<p><p>Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometricsPub Date : 2024-10-03DOI: 10.1093/biomtc/ujae154
Bonnie B Smith, Yujing Gao, Shu Yang, Ravi Varadhan, Andrea J Apter, Daniel O Scharfstein
{"title":"Semi-parametric sensitivity analysis for trials with irregular and informative assessment times.","authors":"Bonnie B Smith, Yujing Gao, Shu Yang, Ravi Varadhan, Andrea J Apter, Daniel O Scharfstein","doi":"10.1093/biomtc/ujae154","DOIUrl":"10.1093/biomtc/ujae154","url":null,"abstract":"<p><p>Many trials are designed to collect outcomes at or around pre-specified times after randomization. If there is variability in the times when participants are actually assessed, this can pose a challenge to learning the effect of treatment, since not all participants have outcome assessments at the times of interest. Furthermore, observed outcome values may not be representative of all participants' outcomes at a given time. Methods have been developed that account for some types of such irregular and informative assessment times; however, since these methods rely on untestable assumptions, sensitivity analyses are needed. We develop a sensitivity analysis methodology that is benchmarked at the explainable assessment (EA) assumption, under which assessment and outcomes at each time are related only through data collected prior to that time. Our method uses an exponential tilting assumption, governed by a sensitivity analysis parameter, that posits deviations from the EA assumption. Our inferential strategy is based on a new influence function-based, augmented inverse intensity-weighted estimator. Our approach allows for flexible semiparametric modeling of the observed data, which is separated from specification of the sensitivity parameter. We apply our method to a randomized trial of low-income individuals with uncontrolled asthma, and we illustrate implementation of our estimation procedure in detail.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 4","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}