{"title":"Boosted sliced regression for dimension reduction in binary classification","authors":"Qin Wang, Edmund Osei","doi":"10.1016/j.csda.2026.108342","DOIUrl":"10.1016/j.csda.2026.108342","url":null,"abstract":"<div><div>Sufficient dimension reduction (SDR) aims at reducing the data dimensionality without loss of the information on the conditional distribution between the response and its high dimensional predictors. Most existing SDR methods were developed under a general regression model, and may lose efficiency when the response is binary. A novel approach is proposed in this study. It combines the gradient boosting machines (GBM) and the sliced regression (SR) to effectively recover the central dimension reduction subspace in binary classification. Numerical experiments and real data applications demonstrate its superior performance and scalability in computation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108342"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Change-point detection for multivariate nonparametric regression with deep neural networks","authors":"Houlin Zhou , Hanbing Zhu , Xuejun Wang","doi":"10.1016/j.csda.2025.108334","DOIUrl":"10.1016/j.csda.2025.108334","url":null,"abstract":"<div><div>This article addresses the problem of detecting structural changes in multivariate nonparametric regression models, which commonly arise in high-dimensional and time-dependent data analysis. We propose a CUSUM-type test statistic constructed from estimators obtained via deep neural networks (DNNs). The theoretical properties of the proposed test statistic are rigorously derived under the null and alternative hypotheses. Under the assumptions of a low-dimensional manifold structure in the data support and a hierarchical model architecture, we demonstrate that the DNN-based change-point detection method can effectively mitigate the curse of dimensionality. Furthermore, we establish the asymptotic properties and derive the convergence rate of the estimator for the change-point location. Extensive comparative simulation studies confirm the effectiveness and superior performance of the proposed approach. Finally, we illustrate the practical applicability of the method through an empirical analysis using real-world regional electricity consumption data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108334"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145908905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pure error REML for analyzing data from multi-stratum designs","authors":"Steven G. Gilmour , Peter Goos , Heiko Großmann","doi":"10.1016/j.csda.2025.108322","DOIUrl":"10.1016/j.csda.2025.108322","url":null,"abstract":"<div><div>Since the dawn of response surface methodology, it has been recommended that designs include replicate points, so that pure error estimates of variance can be obtained and used to provide reliable estimated standard errors of the effects of factors. In designs with more than one stratum, such as split-plot and split-split-plot designs, it is less obvious how pure error estimates of the variance components should be obtained, and no pure error estimates are given by the popular residual maximum likelihood (REML) method of estimation. A method of pure error REML estimation of the variance components, using the full treatment model, is obtained by treating each combination of factor levels as a discrete treatment. This method is easy to implement using standard software and improved estimated standard errors of the fixed effects estimates can be obtained by applying the Kenward-Roger correction based on the pure error REML estimates. The new method is illustrated using several data sets and the performance of pure error REML is compared with the standard REML method. The results are comparable when the assumed response surface model is correct, but the new method is considerably more robust in the case of model misspecification.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108322"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional independence test in factor models via projection correlation","authors":"Xilin Zhang , Hongxia Xu , Guoliang Fan , Liping Zhu","doi":"10.1016/j.csda.2026.108339","DOIUrl":"10.1016/j.csda.2026.108339","url":null,"abstract":"<div><div>Among existing methods for testing independence, projection correlation possesses several appealing properties: it is insensitive to the dimensions of the two random vectors, invariant under orthogonal transformations, and requires no tuning parameters or moment conditions for its estimation. This paper proposes a projection correlation-based approach for measuring and testing conditional dependence within a factor model framework. The proposed measure accommodates response vectors and common factors of varying dimensions while allowing the number of factors to grow to infinity with the sample size. The asymptotic properties of the projection correlation statistic are established under both the null and alternative hypotheses. In addition, a general approach is introduced for constructing dependency graphs without the Gaussian assumption, utilizing the proposed test. Numerical simulations and real data analysis demonstrate the superiority and practicality of the proposed methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"218 ","pages":"Article 108339"},"PeriodicalIF":1.6,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-population sufficient dimension reduction","authors":"Xuerong Meggie Wen , Yuexiao Dong , Li-Xing Zhu","doi":"10.1016/j.csda.2025.108321","DOIUrl":"10.1016/j.csda.2025.108321","url":null,"abstract":"<div><div>A novel dimension-reduction method is introduced for multi-population data. The approach conducts a joint analysis that exploits information shared across populations while accommodating population-specific effects. Unlike partial dimension reduction methods, which identify related directions across all populations, or conditional analyses conducted independently within each population, the proposed two-step procedure leverages cross-population information to enhance estimation accuracy. The methodology is demonstrated through simulations and two real-data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108321"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiarong Ding , Yanmei Shi , Niwen Zhou , Mei Yao , Xu Guo
{"title":"Debiased quantile significance testing with machine learning methods","authors":"Jiarong Ding , Yanmei Shi , Niwen Zhou , Mei Yao , Xu Guo","doi":"10.1016/j.csda.2025.108319","DOIUrl":"10.1016/j.csda.2025.108319","url":null,"abstract":"<div><div>Testing the significance of a subset of covariates for a response is a critical problem with broad applications. A novel nonparametric significance testing procedure is developed to test whether a set of target covariates provides incremental information about the conditional quantile of the response given the other covariates. The proposed test statistics are constructed within the framework of debiased machine learning, which enables flexible estimation of unknown functions by leveraging machine learning methods. The asymptotic properties of the proposed test statistic under the null hypothesis are established, and the power under the alternatives is analyzed, demonstrating the ability of the procedure to detect local alternatives at the optimal parametric rate. To further enhance power, an ensemble quantile significance testing procedure is introduced. Extensive numerical studies and real data applications are conducted to illustrate the finite-sample performance of the proposed testing procedures.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108319"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seasonal ARIMA models with a random period","authors":"Abdelhakim Aknouche , Stefanos Dimitrakopoulos , Nadia Rabehi","doi":"10.1016/j.csda.2025.108320","DOIUrl":"10.1016/j.csda.2025.108320","url":null,"abstract":"<div><div>A general class of seasonal autoregressive integrated moving average models (SARIMA), whose period is an independent and identically distributed random process valued in a finite set, is proposed. This class of models is named random period seasonal ARIMA (SARIMAR). Attention is focused on three subsets of them: the random period seasonal autoregressive (SARR) models, the random period seasonal moving average (SMAR) models and the random period seasonal autoregressive moving average (SARMAR) models. First, the causality, invertibility, and autocovariance shape of these models are revealed. Then, the estimation of the model components (coefficients, innovation variance, probability distribution of the period, (unobserved) sample-path of the random period) is carried out using the Expectation-Maximization algorithm. In addition, a procedure for random elimination of seasonality is developed. A simulation study is conducted to assess the estimation accuracy of the proposed algorithmic scheme. Finally, the usefulness of the proposed methodology is illustrated with two applications about the annual Wolf sunspot numbers and the Canadian lynx data.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108320"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145798821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expectile periodogram","authors":"Tianbo Chen , Ta-Hsin Li , Hanbing Zhu , Wenwu Gao","doi":"10.1016/j.csda.2025.108337","DOIUrl":"10.1016/j.csda.2025.108337","url":null,"abstract":"<div><div>This paper introduces a novel periodogram-like function, called the expectile periodogram (EP), for modeling spectral features of time series and detecting hidden periodicities. The EP is constructed from trigonometric expectile regression (ER), in which a specially designed loss function is used to substitute the squared ℓ<sub>2</sub> norm that leads to the ordinary periodogram. The EP retains the key properties of the ordinary periodogram as a frequency-domain representation of serial dependence in time series, while offering a more comprehensive understanding by examining the data across the entire range of expectile levels. The asymptotic theory is established to investigate the relationship between the EP and the so-called expectile spectrum. Simulations demonstrate the efficiency of the EP in the presence of hidden periodicities. In addition, by leveraging the inherent two-dimensional nature of the EP, we train a deep learning model to classify earthquake waveform data. Notably, our approach outperforms alternative periodogram-based methods in terms of classification accuracy.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108337"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdelhakim Aknouche , Sónia Gouveia , Manuel G. Scotto
{"title":"Random multiplication versus random sum: Autoregressive-like models with integer-valued random inputs","authors":"Abdelhakim Aknouche , Sónia Gouveia , Manuel G. Scotto","doi":"10.1016/j.csda.2025.108323","DOIUrl":"10.1016/j.csda.2025.108323","url":null,"abstract":"<div><div>A common approach to analyze time series of counts is to fit models based on random sum operators. As an alternative, this paper introduces time series models based on a random multiplication operator, which is simply the multiplication of a variable operand by an integer-valued random coefficient, whose mean is the constant operand. Such an operation is endowed into autoregressive-like models with integer-valued random inputs, addressed as RMINAR. Two special variants are studied, namely the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient autoregressive model and the <span><math><msub><mi>N</mi><mn>0</mn></msub></math></span>-valued random coefficient multiplicative error model. Furthermore, <span><math><mi>Z</mi></math></span>-valued extensions are also considered. The dynamic structure of the proposed models is studied in detail. In particular, their corresponding solutions are everywhere strictly stationary and ergodic, which is not common in either the literature on integer-valued time series models or real-valued random coefficient autoregressive models. Therefore, RMINAR model parameters are estimated using a four-stage weighted least squares estimator, with consistency and asymptotic normality established everywhere in the parameter space. Finally, the performance of the new RMINAR models is illustrated with simulated and empirical examples.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108323"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145939216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isaac Diaz-Ray , Huiyan Sang , Guanyu Hu , Ligang Lu
{"title":"Nonparametric density estimation on complex domains using manifold-aware Bayesian additive tree models","authors":"Isaac Diaz-Ray , Huiyan Sang , Guanyu Hu , Ligang Lu","doi":"10.1016/j.csda.2025.108335","DOIUrl":"10.1016/j.csda.2025.108335","url":null,"abstract":"<div><div>Density or intensity function estimation for point pattern data observed on complex domains finds wide applications in spatial data analysis. However, many existing popular density estimation methods face challenges when domains have irregular boundaries, line network structures, sharp concavities, or interior holes. A nonparametric Bayesian additive ensemble of spanning trees model is developed to model the distribution of event occurrences on complex domains. This model uses a random spanning tree weak learner, which can produce flexible and contiguous domain partitions while respecting its geometry and constraints. The method has the advantage of capturing both varying smoothness and sharp changes in density functions. An efficient exact likelihood-based Bayesian inference algorithm is proposed to estimate the density function with uncertainty measures, leveraging a data thinning strategy combined with Poisson-Gamma conjugacy. Simulation studies on various complex domains demonstrate the advantages of the proposed model over competing methods. The method is further applied to the analysis of basketball shot data and crime locations on a road network.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"217 ","pages":"Article 108335"},"PeriodicalIF":1.6,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}