{"title":"A goodness-of-fit test for geometric Brownian motion","authors":"Daniel Gaigall , Philipp Wübbolding","doi":"10.1016/j.csda.2025.108196","DOIUrl":"10.1016/j.csda.2025.108196","url":null,"abstract":"<div><div>A new goodness-of-fit test for the composite null hypothesis that data originate from a geometric Brownian motion is studied in the functional data setting. This is equivalent to testing if the data are from a scaled Brownian motion with linear drift. Critical values for the test are obtained, ensuring that the specified significance level is achieved in finite samples. The asymptotic behavior of the test statistic under the null distribution and alternatives is studied, and it is also demonstrated that the test is consistent. Furthermore, the proposed approach offers advantages in terms of fast and simple implementation. A comprehensive simulation study shows that the power of the new test compares favorably to that of existing methods. A key application is the assessment of financial time series for the suitability of the Black-Scholes model. Examples relating to various stock and interest rate time series are presented in order to illustrate the proposed test.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108196"},"PeriodicalIF":1.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niek G.P. Den Teuling , Francesco Ungolo , Steffen C. Pauws , Edwin R. van den Heuvel
{"title":"Latent-class trajectory modeling with a heterogeneous mean-variance relation","authors":"Niek G.P. Den Teuling , Francesco Ungolo , Steffen C. Pauws , Edwin R. van den Heuvel","doi":"10.1016/j.csda.2025.108199","DOIUrl":"10.1016/j.csda.2025.108199","url":null,"abstract":"<div><div>The benefit of addressing heteroskedastic residual variances across trajectories is investigated with the purpose of finding clusters of longitudinal trajectories. Models are proposed to account for class-specific heteroskedasticity through a mean-variance relation or random residual variance, thereby accounting for trajectory-specific variance. The analyzed latent-class trajectory models are an extension of growth mixture models (GMM). The estimation bias of the model parameters and the recoverability of the number of latent classes are assessed under various data-generating models and settings by means of a simulation study. Furthermore, the empirical applicability of these models is demonstrated through the analysis of the time-varying incidence rate of COVID-19 cases across counties in the United States. Overall, the class-specific mean-variance could be reliably estimated by the proposed models in datasets comprising 250 trajectories. In addition, the extended GMM accounting for the residual random variance showed improved group trajectory estimation over the standard GMM.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108199"},"PeriodicalIF":1.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143904339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Tobin , Michaela Black , James Ng , Debbie Rankin , Jonathan Wallace , Catherine Hughes , Leane Hoey , Adrian Moore , Jinling Wang , Geraldine Horigan , Paul Carlin , Helene McNulty , Anne M. Molloy , Mimi Zhang
{"title":"Co-clustering multi-view data using the Latent Block Model","authors":"Joshua Tobin , Michaela Black , James Ng , Debbie Rankin , Jonathan Wallace , Catherine Hughes , Leane Hoey , Adrian Moore , Jinling Wang , Geraldine Horigan , Paul Carlin , Helene McNulty , Anne M. Molloy , Mimi Zhang","doi":"10.1016/j.csda.2025.108188","DOIUrl":"10.1016/j.csda.2025.108188","url":null,"abstract":"<div><div>The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block-cluster and allowing the use of well-grounded model selection methods. Although the LBM has been adapted to accommodate various feature types, it cannot be applied to datasets consisting of multiple distinct sets of features, termed views, for a common set of observations. The multi-view LBM is introduced herein, extending the LBM method to multi-view data, where each view marginally follows an LBM. For any pair of two views, the dependence between them is captured by a row-cluster membership matrix. A likelihood-based approach is formulated for parameter estimation, harnessing a stochastic EM algorithm merged with a Gibbs sampler, while an ICL criterion is formulated to determine the number of row- and column-clusters in each view. To justify the application of the multi-view approach, hypothesis tests are formulated to evaluate the independence of row-clusters across views, with the testing procedure seamlessly integrated into the estimation framework. A penalty scheme is also introduced to induce sparsity in row-clusterings. The algorithm's performance is validated using synthetic and real-world datasets, accompanied by recommendations for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108188"},"PeriodicalIF":1.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angelika Silbernagel , Christian H. Weiß , Alexander Schnurr
{"title":"Non-parametric tests for cross-dependence based on multivariate extensions of ordinal patterns","authors":"Angelika Silbernagel , Christian H. Weiß , Alexander Schnurr","doi":"10.1016/j.csda.2025.108189","DOIUrl":"10.1016/j.csda.2025.108189","url":null,"abstract":"<div><div>Analyzing the cross-dependence within sequentially observed pairs of random variables is an interesting mathematical problem that also has several practical applications. Most of the time, classical dependence measures like Pearson's correlation are used to this end. This quantity, however, only measures linear dependence and has other drawbacks as well. Different concepts for measuring cross-dependence in sequentially observed random vectors, which are based on so-called ordinal patterns or multivariate generalizations of them, are described. In all cases, limiting distributions of the corresponding test statistics are derived. In a simulation study, the performance of these statistics is compared with three competitors, namely, classical Pearson's and Spearman's correlation as well as the rank-based Chatterjee's correlation coefficient. The applicability of the test statistics is illustrated by using them on two real-world data examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108189"},"PeriodicalIF":1.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143814833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible mixed-membership model for community and enterotype detection for microbiome data","authors":"Alice Giampino, Roberto Ascari, Sonia Migliorati","doi":"10.1016/j.csda.2025.108181","DOIUrl":"10.1016/j.csda.2025.108181","url":null,"abstract":"<div><div>Understanding how the human gut microbiome affects host health is challenging due to the wide interindividual variability, sparsity, and high dimensionality of microbiome data. Mixed-membership models have been previously applied to these data to detect latent communities of bacterial taxa that are expected to co-occur. The most widely used mixed-membership model is latent Dirichlet allocation (LDA). However, LDA is limited by the rigidity of the Dirichlet distribution imposed on the community proportions, which hinders its ability to model dependencies and account for overdispersion. To address this limitation, a generalization of LDA is proposed that introduces greater flexibility into the covariance matrix by incorporating the flexible Dirichlet (FD), a specific identifiable mixture with Dirichlet components. In addition to identifying communities, the new model enables the detection of enterotypes, i.e., clusters of samples with similar microbe composition. For inferential purposes, a computationally efficient collapsed Gibbs sampler that exploits the conjugacy of the FD distribution with respect to the multinomial model is proposed. A simulation study demonstrates the model's ability to accurately recover true parameter values by minimizing appropriate compositional discrepancy measures between the true and estimated values. Additionally, the model correctly identifies the number of communities, as evidenced by perplexity scores. Moreover, an application to the COMBO dataset highlights its effectiveness in detecting biologically significant and coherent communities and enterotypes, revealing a broader range of correlations between community abundances. These results underscore the new model as a definite improvement over LDA.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108181"},"PeriodicalIF":1.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eliciting prior information from clinical trials via calibrated Bayes factor","authors":"Roberto Macrì Demartino , Leonardo Egidi , Nicola Torelli , Ioannis Ntzoufras","doi":"10.1016/j.csda.2025.108180","DOIUrl":"10.1016/j.csda.2025.108180","url":null,"abstract":"<div><div>In the Bayesian framework power prior distributions are increasingly adopted in clinical trials and similar studies to incorporate external and past information, typically to inform the parameter associated with a treatment effect. Their use is particularly effective in scenarios with small sample sizes and where robust prior information is available. A crucial component of this methodology is represented by its weight parameter, which controls the volume of historical information incorporated into the current analysis. Although this parameter can be modeled as either fixed or random, eliciting its prior distribution via a full Bayesian approach remains challenging. In general, this parameter should be carefully selected to accurately reflect the available historical information without dominating the posterior inferential conclusions. A novel simulation-based calibrated Bayes factor procedure is proposed to elicit the prior distribution of the weight parameter, allowing it to be updated according to the strength of the evidence in the data. The goal is to facilitate the integration of historical data when there is agreement with current information and to limit it when discrepancies arise in terms, for instance, of prior-data conflicts. The performance of the proposed method is tested through simulation studies and applied to real data from clinical trials.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108180"},"PeriodicalIF":1.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hidden semi-Markov models with inhomogeneous state dwell-time distributions","authors":"Jan-Ole Koslik","doi":"10.1016/j.csda.2025.108171","DOIUrl":"10.1016/j.csda.2025.108171","url":null,"abstract":"<div><div>The well-established methodology for the estimation of hidden semi-Markov models (HSMMs) as hidden Markov models (HMMs) with extended state spaces is further developed. Covariate influences are incorporated across all aspects of the state process model, in particular regarding the distributions governing the state dwell time. The special case of periodically varying covariate effects on the state dwell-time distributions — and possibly the conditional transition probabilities — is examined in detail. Important properties of these models are derived, including the periodically varying unconditional state distribution as well as the overall state dwell-time distribution. Simulation studies are conducted to assess key properties of these models and provide recommendations for hyperparameter settings. A case study involving an HSMM with periodically varying dwell-time distributions is presented to analyse the movement trajectory of an Arctic muskox, demonstrating the practical relevance of the developed methodology.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108171"},"PeriodicalIF":1.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selecting time-series hyperparameters with the artificial jackknife","authors":"Filippo Pellegrino","doi":"10.1016/j.csda.2025.108173","DOIUrl":"10.1016/j.csda.2025.108173","url":null,"abstract":"<div><div>A generalisation of the delete-<em>d</em> jackknife is proposed for solving hyperparameter selection problems in time series. The method is referred to as the artificial delete-<em>d</em> jackknife, emphasizing that it replaces the classic removal step with a fictitious deletion, wherein observed data points are replaced with artificial missing values. This procedure preserves the data order, ensuring seamless compatibility with time series. The approach is asymptotically justified and its finite-sample properties are studied via simulations. In addition, an application based on foreign exchange rates illustrates its practical relevance.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108173"},"PeriodicalIF":1.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Manifold-valued models for analysis of EEG time series data","authors":"Tao Ding , Tom M.W. Nye , Yujiang Wang","doi":"10.1016/j.csda.2025.108168","DOIUrl":"10.1016/j.csda.2025.108168","url":null,"abstract":"<div><div>EEG (electroencephalogram) records brain electrical activity and is a vital clinical tool in the diagnosis and treatment of epilepsy. Time series of covariance matrices between EEG channels for patients suffering from epilepsy, obtained from an open-source dataset, are analysed. The aim is two-fold: to develop a model with interpretable parameters for different possible modes of EEG dynamics, and to explore the extent to which modelling results are affected by the choice of geometry imposed on the space of covariance matrices. The space of full-rank covariance matrices of fixed dimension forms a smooth manifold, and any statistical analysis inherently depends on the choice of metric or Riemannian structure on this manifold. The model specifies a distribution for the tangent direction vector at any time point, combining an autoregressive term, a mean reverting term and a form of Gaussian noise. Parameter inference is performed by maximum likelihood estimation, and we compare modelling results obtained using the standard Euclidean geometry and the affine invariant geometry on covariance matrices. The findings reveal distinct dynamics between epileptic seizures and interictal periods (between seizures), with interictal series characterized by strong mean reversion and absence of autoregression, while seizures exhibit significant autoregressive components with weaker mean reversion. The fitted models are also used to measure seizure dissimilarity within and between patients.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108168"},"PeriodicalIF":1.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun
{"title":"A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates","authors":"Yichen Lou , Yuqing Ma , Liming Xiang , Jianguo Sun","doi":"10.1016/j.csda.2025.108177","DOIUrl":"10.1016/j.csda.2025.108177","url":null,"abstract":"<div><div>This paper discusses regression analysis of interval-censored failure time data that commonly occur in biomedical studies among others. For the situation, the failure event of interest is known only to occur within an interval instead of being observed exactly. In addition to interval censoring on the failure time of interest, sometimes covariates may be missing or suffer censoring, which can bring extra theoretical and computational challenges for the regression analysis. To deal with such data, we propose a novel multiple imputation approach with the use of the rejection sampling under a class of semiparametric transformation models. The proposed method is flexible and can lead to more efficient estimation than the existing methods, and the resulting estimators are shown to be consistent and asymptotically normal. An extensive simulation study is conducted and demonstrates that the proposed approach works well in practice. Finally, we apply the proposed approach to a set of real data on Alzheimer's disease that motivated this study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108177"},"PeriodicalIF":1.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143714600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}