{"title":"Sequential hierarchical Bayesian model and particle filter estimation with two-step RJMCMC resampling","authors":"Yue Huan , Guoqiang Wang , Hai Xiang Lin","doi":"10.1016/j.csda.2025.108304","DOIUrl":"10.1016/j.csda.2025.108304","url":null,"abstract":"<div><div>Data assimilation (DA) combines numerical model simulations with observed data to obtain the best possible description of a dynamical system and its uncertainty. Incorrect modeling assumptions can lead to filter divergence, making model identification an important issue in the field of DA. Variations in dynamic model structures can result in differences in parameter dimensions, complicating the resampling step in PFs. To meet this challenge, the Sequential Hierarchical Bayesian Model (SHBM) is proposed in this paper, which integrates the evolution model along with observation model from the DA scheme, and the hierarchical parameter model. A two-step resampling method are also proposed to estimate the SHBM: the first step uses the resampling scheme in the bootstrap filter to resample new particles based on weights, which may produce some duplicate particles; the second step utilizes the Reversible Jump Markov Chain Monte Carlo (RJMCMC) methods to draw new particles from the target distribution. This approach ensures particle diversity, with the first step aiming at avoiding particle degeneracy, and the second step intends to prevent the sample impoverishment. The performance in the Advection Equation example and Lorenz 96 example demonstrates the effectiveness of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108304"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A semi-parametric approach to receiver operating characteristic analysis with semi-continuous biomarker","authors":"Baohao Wei , Dongsheng Tu , Chunlin Wang","doi":"10.1016/j.csda.2025.108305","DOIUrl":"10.1016/j.csda.2025.108305","url":null,"abstract":"<div><div>The receiver operating characteristic (ROC) curve and its summary measures, such as the area under the curve (AUC) and Youden index, are frequently used to evaluate the performance of a binary classifier based on data of a continuous biomarker and meanwhile identify a suitable cut-off point for classification. In clinical applications, the biomarker used for classification may be semi-continuous in the sense that the observations contain excess zero values and the distribution of the positive values is skewed. In this paper, the distribution of a semi-continuous biomarker is modeled using a mixture of a discrete mass at zero and a continuous skewed positive component. In addition, the distributions of the continuous component in subjects with true negative and positive outcomes are linked by a semi-parametric density ratio model to gain efficiency. Under this framework, unified estimation and inference procedures are proposed for the ROC curve, its important summary measures, and the associated cut-off point. The asymptotic properties of the proposed semi-parametric estimators are established and used to construct their corresponding confidence intervals. Simulation results demonstrate the desirable performance of these estimators and confidence intervals in various settings. The proposed semi-parametric approach is also applied to assess the semi-continuous BRCA1 biomarker as a valid prognostic biomarker for predicting cancer progression at 4 years and identifying a cut-off point to classify patients with advanced ovarian cancer into two groups with good and bad prognoses.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108305"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie Michaelides , Hélène Cossette , Mathieu Pigeon
{"title":"Parametric estimation of conditional archimedean copula generators for censored data","authors":"Marie Michaelides , Hélène Cossette , Mathieu Pigeon","doi":"10.1016/j.csda.2025.108309","DOIUrl":"10.1016/j.csda.2025.108309","url":null,"abstract":"<div><div>A novel framework is introduced for estimating Archimedean copula generators in a conditional setting by embedding endogenous variables directly within the generator function. Unlike standard copula constructions that rely on a fixed dependence structure across all covariate levels, the proposed methodology allows both the strength and the shape of dependence to evolve with the covariates. To identify the values of a continuous risk factor at which the dependence pattern undergoes substantive changes, an iterative splitting algorithm is developed to determine optimal partitioning points within the covariate range. The approach is evaluated through applications to a diabetic retinopathy study and a claims reserving analysis, illustrating that explicitly modelling covariate effects yields a more accurate representation of dependence and enhances the practical relevance of copula models in medical and actuarial settings.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"216 ","pages":"Article 108309"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive nonparametric predictive for a discrete regression model","authors":"Lorenzo Cappello , Stephen G. Walker","doi":"10.1016/j.csda.2025.108275","DOIUrl":"10.1016/j.csda.2025.108275","url":null,"abstract":"<div><div>A recursive algorithm is proposed to estimate a set of distribution functions indexed by a regressor variable. The procedure is fully nonparametric and has a Bayesian motivation and interpretation. Indeed, the recursive algorithm follows a certain Bayesian update, defined by the predictive distribution of a Dirichlet process mixture of linear regression models. Consistency of the algorithm is demonstrated under mild assumptions, and numerical accuracy in finite samples is shown via simulations and real data examples. The algorithm is very fast to implement, it is parallelizable, sequential, and requires limited computing power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108275"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145227724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Change-point detection in regression models via the max-EM algorithm","authors":"Modibo Diabaté , Grégory Nuel , Olivier Bouaziz","doi":"10.1016/j.csda.2025.108278","DOIUrl":"10.1016/j.csda.2025.108278","url":null,"abstract":"<div><div>The problem of breakpoint detection is considered within a regression modeling framework. A novel method, the max-EM algorithm, is introduced, combining a constrained Hidden Markov Model with the Classification-EM algorithm. This algorithm has linear complexity and provides accurate detection of breakpoints and estimation of parameters. A theoretical result is derived, showing that the likelihood of the data, as a function of the regression parameters and the breakpoints location, increases at each step of the algorithm. Two initialization methods for the breakpoints location are also presented to address local maxima issues. Finally, a statistical test in the one breakpoint situation is developed. Simulation experiments based on linear, logistic, Poisson and Accelerated Failure Time regression models show that the final method that includes the initialization procedure and the max-EM algorithm has a strong performance both in terms of parameters estimation and breakpoints detection. The statistical test is also evaluated and exhibits a correct rejection rate under the null hypothesis and a strong power under various alternatives. Two real dataset are analyzed, the UCI bike sharing and the health disease data, where the interest of the method to detect heterogeneity in the distribution of the data is illustrated.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108278"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Qin , Xiaomei Zhang , Yingqiu Zhu , Yang Chen , Ben-Chang Shia
{"title":"Bilateral matrix spatiotemporal autoregressive model","authors":"Lei Qin , Xiaomei Zhang , Yingqiu Zhu , Yang Chen , Ben-Chang Shia","doi":"10.1016/j.csda.2025.108291","DOIUrl":"10.1016/j.csda.2025.108291","url":null,"abstract":"<div><div>As time series with matrix structures becoming more and more common in the fields of finance, economics, and management, modeling matrix-valued time series becomes an emerging research hotspot. Spatial effects lead by different locations play an important role in the analysis of time series. Although matrix autoregressive model (MAR) provides a promising solution for modeling matrix-valued time series, it only models the dynamic effects in the temporal dimension, without capturing the spatial effects. In this paper, we propose a bilateral matrix spatiotemporal autoregressive model (BMSAR), which fully considers the pure spatial effects, pure dynamic effects, and time-delay spatial effects while maintaining and utilizing the matrix structure. In order to solve the endogeneity problem, the estimation process for BMSAR is based on the least squares method and the Yule-Walker equation for iterative estimation. The simulation results show that as compared with the MAR, the BMSAR model effectively reflects the impact of spatial structure on the sequence observations. The estimator for BMSAR proposed in this paper is consistent. It achieves promising performance when the sample size is relatively large. The proposed model and algorithm are also verified using the trade and macroeconomic indicator datasets of seven countries in the G7 summit, and the prediction accuracy is significantly improved as compared with the existing models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108291"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou
{"title":"Hypothesis test in high dimensional multi-response linear models","authors":"Yuan Ke , Rongmao Zhang , Wenyang Zhang , Changliang Zou","doi":"10.1016/j.csda.2025.108303","DOIUrl":"10.1016/j.csda.2025.108303","url":null,"abstract":"<div><div>Data with multiple responses is very common in economics, engineering, finance, and social science. Analyzing each response variable separately may not be a good strategy as this approach can overlook important information and lead to suboptimal results. In some cases, it may not even provide an answer to the question of interest. Multi-response linear models serve as an important tool for joint analysis. While the methodology and theory of classic multi-response linear models are well-established, they may not be applicable to high-dimensional cases. In this paper, we propose a powerful hypothesis test for the coefficient matrix of a high-dimensional multi-response linear model. We establish asymptotic results and conduct comprehensive simulation studies to demonstrate that the proposed hypothesis test is more powerful than alternative methods. Furthermore, we apply the hypothesis test to two real datasets, illustrating its usefulness in addressing practical problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108303"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer learning for high dimensional data with discrete responses","authors":"Zejing Zheng, Shengbing Zheng, Junlong Zhao","doi":"10.1016/j.csda.2025.108292","DOIUrl":"10.1016/j.csda.2025.108292","url":null,"abstract":"<div><div>Discrete responses are frequently encountered in applications, particularly in classification problems. However, the high cost of collecting responses or labels often leads to a scarcity of samples, which significantly diminishes the accuracy of statistical inferences, particularly in high-dimensional settings. To address this limitation, transfer learning can be utilized for high-dimensional data with discrete responses by incorporating relevant source data into the target study of interest. Within the framework of generalized linear models, the case where responses are bounded are first considered, and an importance-weighted transfer learning method, referred to as IWTL-DR, is proposed. This method selects data at the individual level, thereby utilizing the source data more efficiently. Subsequently, this approach is extended to scenarios involving unbounded responses. Theoretical properties of the IWTL-DR method are established and compared with existing techniques. Extensive simulations and analyses of real data show the advantages of our approach.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108292"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Emil Thiel , Paavo Sattler , Arne C. Bathke , Georg Zimmermann
{"title":"Resampling NANCOVA: Nonparametric analysis of covariance in small samples","authors":"Konstantin Emil Thiel , Paavo Sattler , Arne C. Bathke , Georg Zimmermann","doi":"10.1016/j.csda.2025.108290","DOIUrl":"10.1016/j.csda.2025.108290","url":null,"abstract":"<div><div>Analysis of covariance is a crucial method for improving precision of statistical tests for factor effects in randomized experiments. However, existing solutions suffer from one or more of the following limitations: (i) they are not suitable for ordinal data (as endpoints or explanatory variables); (ii) they require semiparametric model assumptions; (iii) they are inapplicable to small data scenarios due to often poor type-I error control; or (iv) they provide only approximate testing procedures and (asymptotically) exact test are missing. A resampling approach to the NANCOVA framework is investigated. NANCOVA is a fully nonparametric model based on <em>relative effects</em> that allows for an arbitrary number of covariates and groups, where both outcome variable (endpoint) and covariates can be metric or ordinal. Novel NANCOVA tests and a nonparametric competitor test without covariate adjustment were evaluated in extensive simulations. Unlike approximate tests in the NANCOVA framework, the proposed resampling version showed good performance in small sample scenarios and maintained the nominal type-I error well. Resampling NANCOVA also provided consistently high power: up to 26 % higher than the test without covariate adjustment in a small sample scenario with 4 groups and two covariates. Moreover, it is shown that resampling NANCOVA provides an asymptotically exact testing procedure, which makes it the first one with good finite sample performance in the present NANCOVA framework. In summary, resampling NANCOVA can be considered a viable tool for analysis of covariance overcoming issues (i) - (iv).</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108290"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrap-based goodness-of-fit test for parametric families of conditional distributions","authors":"Gitte Kremling, Gerhard Dikta","doi":"10.1016/j.csda.2025.108289","DOIUrl":"10.1016/j.csda.2025.108289","url":null,"abstract":"<div><div>A consistent goodness-of-fit test for distributional regression is introduced. The test statistic is based on a process that traces the difference between a nonparametric and a semi-parametric estimate of the marginal distribution function of <span><math><mi>Y</mi></math></span>. As its asymptotic null distribution is not distribution-free, a parametric bootstrap method is used to determine critical values. Empirical results suggest that, in certain scenarios, the test outperforms existing specification tests by achieving a higher power and thereby offering greater sensitivity to deviations from the assumed parametric distribution family. Notably, the proposed test does not involve any hyperparameters and can easily be applied to individual datasets using the gofreg-package in R.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"215 ","pages":"Article 108289"},"PeriodicalIF":1.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145270816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}