{"title":"Inference for the stochastic FitzHugh-Nagumo model from real action potential data via approximate Bayesian computation","authors":"Adeline Samson , Massimiliano Tamborrino , Irene Tubikanec","doi":"10.1016/j.csda.2024.108095","DOIUrl":"10.1016/j.csda.2024.108095","url":null,"abstract":"<div><div>The stochastic FitzHugh-Nagumo (FHN) model is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low-dimensionality, its analytical and numerical tractability and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, ‘‘recovering’’ the unobserved second component) and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. The proposed method tackles all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise and partial observations). It is an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm, which relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation and on summary statistics exploiting the structural properties of the process. All model parameters are successfully estimated from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108095"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-dimensional copula-based Wasserstein dependence","authors":"Steven De Keyser, Irène Gijbels","doi":"10.1016/j.csda.2024.108096","DOIUrl":"10.1016/j.csda.2024.108096","url":null,"abstract":"<div><div>The aim is to generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, it is of interest to look at general axioms for dependence measures between multiple random vectors, at plausible normalizations, and at various examples. Afterwards, it is important to study plug-in estimators based on penalized empirical covariance matrices in order to deal with high dimensionality issues and taking possible marginal independencies into account by inducing (block) sparsity. The latter ideas are investigated via a simulation study, considering other dependence coefficients as well. The use of the developed methods is illustrated in two real data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108096"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux
{"title":"Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves","authors":"Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux","doi":"10.1016/j.csda.2024.108094","DOIUrl":"10.1016/j.csda.2024.108094","url":null,"abstract":"<div><div>The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package <span>bayesFPCA</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108094"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dirichlet process model for directional-linear data with application to bloodstain pattern analysis","authors":"Tong Zou, Hal S. Stern","doi":"10.1016/j.csda.2024.108093","DOIUrl":"10.1016/j.csda.2024.108093","url":null,"abstract":"<div><div>Directional data require specialized models because of the non-Euclidean nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. A Bayesian nonparametric approach is introduced to analyze directional-linear data. Firstly, the projected normal distribution is extended to model the joint distribution of linear variables and a directional variable with arbitrary dimension projected from a higher-dimensional augmented multivariate normal distribution. The new distribution is called the semi-projected normal distribution (SPN) and can be used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. Then, a conditional inverse-Wishart distribution is proposed as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN. The SPN mixture model shows superior performance in clustering on synthetic data compared to the semi-wrapped Gaussian model. The experiments show the ability of the SPN mixture model to characterize bloodstain patterns. A hierarchical Dirichlet process model with the SPN distribution is built to estimate the likelihood of bloodstain patterns under a posited causal mechanism for use in a likelihood ratio approach to the analysis of forensic bloodstain pattern evidence.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108093"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lost in the shuffle: Testing power in the presence of errorful network vertex labels","authors":"Ayushi Saxena, Vince Lyzinski","doi":"10.1016/j.csda.2024.108091","DOIUrl":"10.1016/j.csda.2024.108091","url":null,"abstract":"<div><div>Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108091"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical modeling of Dengue transmission dynamics with environmental factors","authors":"Lengyang Wang , Mingke Zhang","doi":"10.1016/j.csda.2024.108080","DOIUrl":"10.1016/j.csda.2024.108080","url":null,"abstract":"<div><div>Dengue fever is one of the most common mosquito-borne infectious diseases in tropical regions. Understanding the dynamics of dengue transmission can help provide timely early warnings, thereby reducing mortality. However, previous studies have failed to simulate faithfully dengue dynamics and answer questions pertinent to outbreaks. By incorporating environmental factors into a time-series-susceptible-infectious-recovered (TSIR) model, a new substantive model, to analyze their impact on transmission, is proposed. The newly proposed environmental-time-series-susceptible-infectious-recovered (ETSIR) model can highlight statistically their significance on dengue transmission, thus providing deeper insight into the transmission and addressing several epidemiological puzzles.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108080"},"PeriodicalIF":1.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueru Zhang , Dennis K.J. Lin , Min-Qian Liu , Jianbin Chen
{"title":"Analysis of order-of-addition experiments","authors":"Xueru Zhang , Dennis K.J. Lin , Min-Qian Liu , Jianbin Chen","doi":"10.1016/j.csda.2024.108077","DOIUrl":"10.1016/j.csda.2024.108077","url":null,"abstract":"<div><div>The order-of-addition (OofA) experiment involves arranging components in a specific order to optimize a certain objective, which is attracting a great deal of attention in many disciplines, especially in the areas of biochemistry, scheduling, and engineering. Recent studies have highlighted its significance, and notable works have aimed to address NP-hard OofA problems from a statistical perspective. However, solving OofA problems presents challenges due to their complex nature and the presence of uncertainty, such as scheduling problems with uncertain processing times. These uncertainties affect processing times, which are not known with certainty in advance. They introduce heteroscedasticity into OofA experiments, where different orders result in varying dispersions. To address these challenges, a unified framework is proposed to analyze scheduling problems without making specific assumptions about the distribution of these certainties. It encompasses model development and optimization, encapsulating existing homoscedastic studies (where different orders produce the same dispersion value) as a specific instance. For heteroscedastic cases, a dual response optimization within an uncertainty set is proposed, aiming to minimize the dispersion of response while keeping the location of response with a predefined target value. However, solving the proposed non-linear minimax optimization is rather challenging. An equivalent optimization formulation with low computational cost is proposed for solving such a challenging problem. Theoretical supports are established to ensure the tractability of the proposed method. Simulation studies are conducted to demonstrate the effectiveness of the proposed approach. With its solid theoretical support, ease of implementation, and ability to find an optimal order, the proposed approach offers a practical and competitive solution to solving general order-of-addition problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108077"},"PeriodicalIF":1.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Álvarez-Liébana , A. López-Pérez , W. González-Manteiga , M. Febrero-Bande
{"title":"A goodness-of-fit test for functional time series with applications to Ornstein-Uhlenbeck processes","authors":"J. Álvarez-Liébana , A. López-Pérez , W. González-Manteiga , M. Febrero-Bande","doi":"10.1016/j.csda.2024.108092","DOIUrl":"10.1016/j.csda.2024.108092","url":null,"abstract":"<div><div>High-frequency financial data can be collected as a sequence of time-ordered curves, such as intraday prices. The Functional Data Analysis (FDA) framework offers a powerful approach to uncover information embedded in the shape of the daily paths, often unavailable from classical statistical methods. A novel goodness-of-fit test for autoregressive Hilbertian (ARH) models is introduced, imposing only the Hilbert-Schmidt condition on the autocorrelation operator. The test statistic is formulated in terms of a Cramér–von Mises norm, with calibration achieved via a wild bootstrap resampling procedure. A simulation study examines the test's finite-sample performance in terms of power and size. Furthermore, a new specification test for diffusion models, including Ornstein-Uhlenbeck processes, is proposed, illustrated with an application to intraday currency exchange rates. Specifically, a two-stage methodology is proffered: firstly, the relationship between functional samples and their lagged values is assessed using an ARH(1) model; second, under linearity, a functional F-test is conducted.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108092"},"PeriodicalIF":1.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted support vector machine for extremely imbalanced data","authors":"Jongmin Mun , Sungwan Bang , Jaeoh Kim","doi":"10.1016/j.csda.2024.108078","DOIUrl":"10.1016/j.csda.2024.108078","url":null,"abstract":"<div><div>Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive class. This method formalizes three intuitions: (i) oversampling should reflect the structure of the positive class; (ii) weights should account for both the imbalance and oversampling ratios; (iii) synthetic samples should carry less weight than the original samples. The proposed method generates synthetic samples from the estimated positive class distribution using a Gaussian mixture model. To prevent overfitting to excessive synthetic samples, different misclassification penalties are assigned to the original positive class, synthetic positive class, and negative class. The proposed method is numerically validated through simulations and an analysis of Republic of Korea Army artillery training data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108078"},"PeriodicalIF":1.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cox regression model with doubly truncated and interval-censored data","authors":"Pao-sheng Shen","doi":"10.1016/j.csda.2024.108090","DOIUrl":"10.1016/j.csda.2024.108090","url":null,"abstract":"<div><div>Interval sampling is an efficient sampling scheme used in epidemiological studies. Doubly truncated (DT) data arise under this sampling scheme when the failure time can be observed exactly. In practice, the failure time may not be observed and might be recorded only within time intervals, leading to doubly truncated and interval censored (DTIC) data. This article considers regression analysis of DTIC data under the Cox proportional hazards (PH) model and develops the conditional maximum likelihood estimators (cMLEs) for the regression parameters and baseline cumulative hazard function of models. The cMLEs are shown to be consistent and asymptotically normal. Simulation results indicate that the cMLEs perform well for samples of moderate size.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108090"},"PeriodicalIF":1.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}