{"title":"GMM estimation and variable selection of semiparametric model with increasing dimension and high-order spatial dependence","authors":"Fang Lu , Hao Pan , Jing Yang","doi":"10.1016/j.csda.2024.108113","DOIUrl":"10.1016/j.csda.2024.108113","url":null,"abstract":"<div><div>To address various forms of spatial dependence and the heterogeneous effects of the impacts of some regressors, this paper concentrates on the generalized method of moments (GMM) estimation and variable selection of higher-order spatial autoregressive (SAR) model with semi-varying coefficients and diverging number of parameters. With the varying coefficient functions being approximated by basis functions, the GMM estimation procedure is firstly proposed and then, a novel and convenient smooth-threshold GMM procedure is constructed for variable selection based on the smooth-threshold estimating equations. Under some regularity conditions, the asymptotic properties of the proposed estimation and variable selection methods are established. In particular, the asymptotic normality of the parametric estimator is derived via a novel way based on some fundamental operations on block matrix. Compared to the existing estimation methods of semiparametric SAR models, our proposed series-based GMM procedure can simultaneously enjoy the merits of lower computing cost, higher estimation accuracy or higher applicability, especially in the case of heteroscedasticity. Extensive numerical simulations are conducted to confirm the theories and to demonstrate the advantages of the proposed method, in finite sample performance. Two real data analysis are further followed for application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"205 ","pages":"Article 108113"},"PeriodicalIF":1.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A debiasing phylogenetic tree-assisted regression model for microbiome data","authors":"Yanhui Li , Luqing Zhao , Jinjuan Wang","doi":"10.1016/j.csda.2024.108111","DOIUrl":"10.1016/j.csda.2024.108111","url":null,"abstract":"<div><div>Identifying associations between microbial taxa and sample features has always been a worthwhile issue in microbiome analysis and various regression-based methods have been proposed. These methods can roughly be divided into two types. One considers sparsity characteristic of the microbiome data in the analysis, and the other considers phylogenetic tree to employ evolutionary information. However, none of these methods apply both sparsity and phylogenetic tree thoroughly in the regression analysis with theoretical guarantees. To fill this gap, a phylogenetic tree-assisted regression model accompanied by a Lasso-type penalty is proposed to detect feature-related microbial compositions. Specifically, based on the rational assumption that the smaller the phylogenetic distance between two microbial species, the closer their coefficients in the regression model, the phylogenetic tree is accommodated into the regression model by constructing a Laplacian-type penalty in the loss function. Both linear regression model for continuous outcome and generalized linear regression model for categorical outcome are analyzed in this framework. Additionally, debiasing algorithms are proposed for the coefficient estimators to give more precise evaluation. Extensive numerical simulations and real data analyses demonstrate the higher efficiency of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"205 ","pages":"Article 108111"},"PeriodicalIF":1.5,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On summed nonparametric dependence measures in high dimensions, fixed or large samples","authors":"Kai Xu , Qing Cheng , Daojiang He","doi":"10.1016/j.csda.2024.108109","DOIUrl":"10.1016/j.csda.2024.108109","url":null,"abstract":"<div><div>For the mutual independence testing problem, the use of summed nonparametric dependence measures, including Hoeffding's <em>D</em>, Blum-Kiefer-Rosenblatt's <em>R</em>, Bergsma-Dassios-Yanagimoto's <span><math><msup><mrow><mi>τ</mi></mrow><mrow><mo>⁎</mo></mrow></msup></math></span>, is considered. The asymptotic normality of this class of test statistics for the null hypothesis is established when (i) both the dimension and the sample size go to infinity simultaneously, and (ii) the dimension tends to infinity but the sample size is fixed. The new result for the asymptotic regime (ii) is applicable to the HDLSS (High Dimension, Low Sample Size) data. Further, the asymptotic Pitman efficiencies of the family of considered tests are investigated with respect to two important sum-of-squares tests for the asymptotic regime (i): the distance covariance based test and the product-moment covariance based test. Formulae for asymptotic relative efficiencies are found. An interesting finding reveals that even if the population follows a normally distributed structure, the two state-of-art tests suffer from power loss if some components of the underlying data have different scales. Simulations are conducted to confirm our asymptotic results. A real data analysis is performed to illustrate the considered methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"205 ","pages":"Article 108109"},"PeriodicalIF":1.5,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive distributed smooth composite quantile regression estimation for large-scale data","authors":"Kangning Wang, Jingyu Zhang, Xiaofei Sun","doi":"10.1016/j.csda.2024.108110","DOIUrl":"10.1016/j.csda.2024.108110","url":null,"abstract":"<div><div>Composite quantile regression (CQR) is a good statistical learning tool because of its estimation efficiency and robustness advantages, but the growing size of modern data is bringing challenges to it. First, the non-smoothness of CQR loss function poses high computation burden in large-scale problems. Second, although some distributed CQR algorithms have been proposed, they heavily rely on uniformity and randomness conditions, which are frequently violated in practice. To address these issues, this article first proposes a smooth CQR by constructing a smooth loss, which can converge to the original non-smooth loss uniformly. Then a distributed CQR is developed, in which the estimator can be calculated conveniently by minimizing a pilot sample-based distributed surrogate loss. In particular, it can be adaptive when the uniformity or randomness condition is violated. The established theoretical results and numerical experiments all confirm the proposed methods.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108110"},"PeriodicalIF":1.5,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulia Ferrandi , Michiel E. Hochstenbach , M. Rosário Oliveira
{"title":"A subspace method for large-scale trace ratio problems","authors":"Giulia Ferrandi , Michiel E. Hochstenbach , M. Rosário Oliveira","doi":"10.1016/j.csda.2024.108108","DOIUrl":"10.1016/j.csda.2024.108108","url":null,"abstract":"<div><div>A subspace method is introduced to solve large-scale trace ratio problems. This approach is matrix-free, requiring only the action of the two matrices involved in the trace ratio. At each iteration, a smaller trace ratio problem is addressed in the search subspace. Additionally, the algorithm is endowed with a restarting strategy, that ensures the monotonicity of the trace ratio value throughout the iterations. The behavior of the approximate solution is investigated from a theoretical viewpoint, extending existing results on Ritz values and vectors, as the angle between the search subspace and the exact solution approaches zero. Numerical experiments in multigroup classification show that this new subspace method tends to be more efficient than iterative approaches relying on (partial) eigenvalue decompositions at each step.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"205 ","pages":"Article 108108"},"PeriodicalIF":1.5,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subgroup learning for multiple mixed-type outcomes with block-structured covariates","authors":"Xun Zhao , Lu Tang , Weijia Zhang , Ling Zhou","doi":"10.1016/j.csda.2024.108105","DOIUrl":"10.1016/j.csda.2024.108105","url":null,"abstract":"<div><div>The increasing interest in survey research focuses on inferring grouped association patterns between risk factors and questionnaire responses, with grouping shared across multiple response variables that jointly capture one's underlying status. Aiming to identify important risk factors that are simultaneously associated with the health and well-being of senior adults, a study based on the China Health and Retirement Survey (CHRS) is conducted. Previous studies have identified several known risk factors, yet heterogeneity in the outcome-risk factor association exists, prompting the use of subgroup analysis. A subgroup analysis procedure is devised to model a multiple mixed-type outcome which describes one's general health and well-being, while tackling additional challenges including collinearity and weak signals within block-structured covariates. Computationally, an efficient algorithm that alternately updates a set of estimating equations and likelihood functions is proposed. Theoretical results establish the asymptotic consistency and normality of the proposed estimators. The validity of the proposed method is corroborated by simulation experiments. An application of the proposed method to the CHRS data identifies caring for grandchildren as a new risk factor for poor physical and mental health.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108105"},"PeriodicalIF":1.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A copula duration model with dependent states and spells","authors":"Simon M.S. Lo , Shuolin Shi , Ralf A. Wilke","doi":"10.1016/j.csda.2024.108104","DOIUrl":"10.1016/j.csda.2024.108104","url":null,"abstract":"<div><div>A nested Archimedean copula model for dependent states and spells is introduced and the link to a classical survival model with frailties is established. The model relaxes an important restriction of classical survival models as the distributions of unobservable heterogeneities are permitted to depend on the observable covariates. Its modular structure has practical advantages as the different components can be separately specified and estimation can be done sequentially or separately. This makes the model versatile and adaptable in empirical work. An application to labour market transitions with linked administrative data supports the need for a flexible specification of the dependence structure and the model for the marginal survivals. The conventional Markov Chain Model is shown to give sizeably biased results in the application.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108104"},"PeriodicalIF":1.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of a new general class of bivariate distributions based on reversed hazard rate order","authors":"Na Young Yoo , Hyunju Lee , Ji Hwan Cha","doi":"10.1016/j.csda.2024.108106","DOIUrl":"10.1016/j.csda.2024.108106","url":null,"abstract":"<div><div>Motivated by real data sets to be analyzed in this paper, we develop a new general class of bivariate distributions that can model the effect of the so-called ‘load-sharing configuration’ in a system with two components based on the reversed hazard rate. Under such load-sharing configuration, after the failure of one component, the surviving component has to shoulder extra load, which eventually results in its failure at an earlier time than what is expected under the case of independence. In the developed class of bivariate distributions, it is assumed that the residual lifetime of the remaining component is shortened according to the reversed hazard rate order. We derive the joint survival function, joint probability density function and the marginal distributions. We discuss a bivariate ageing property of the developed class of distributions. Some specific families of bivariate distributions which can be usefully applied in practice are obtained. These families of bivariate distributions are applied to some real data sets to illustrate their usefulness.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108106"},"PeriodicalIF":1.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task optimization with Bayesian neural network surrogates for parameter estimation of a simulation model","authors":"Hyungjin Kim , Chuljin Park , Heeyoung Kim","doi":"10.1016/j.csda.2024.108097","DOIUrl":"10.1016/j.csda.2024.108097","url":null,"abstract":"<div><div>We propose a novel framework for efficient parameter estimation in simulation models, formulated as an optimization problem that minimizes the discrepancy between physical system observations and simulation model outputs. Our framework, called multi-task optimization with Bayesian neural network surrogates (MOBS), is designed for scenarios that require the simultaneous estimation of multiple sets of parameters, each set corresponding to a distinct set of observations, while also enabling fast parameter estimation essential for real-time process monitoring and control. MOBS integrates a heuristic search algorithm, utilizing a single-layer Bayesian neural network surrogate model trained on an initial simulation dataset. This surrogate model is shared across multiple tasks to select and evaluate candidate parameter values, facilitating efficient multi-task optimization. We provide a closed-form parameter screening rule and demonstrate that the expected number of simulation runs converges to a user-specified threshold. Our framework was applied to a numerical example and a semiconductor manufacturing case study, significantly reducing computational costs while achieving accurate parameter estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108097"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal sequential detection by sparsity likelihood","authors":"Jingyan Huang, Hock Peng Chan","doi":"10.1016/j.csda.2024.108089","DOIUrl":"10.1016/j.csda.2024.108089","url":null,"abstract":"<div><div>We propose here a sparsity likelihood stopping rule to detect change-points when there are multiple data streams. It is optimal in the sense of minimizing, asymptotically, the detection delay when the change-points is present in only a small fraction of the data streams. This optimality holds at all levels of change-point sparsity. A key contribution of this paper is that we show optimality when there is extreme sparsity. Extreme sparsity refers to the number of data streams with change-points increasing very slowly as the number of data streams goes to infinity. The theoretical results are backed by a numerical study that shows the sparsity likelihood stopping rule performing well at all levels of sparsity. Applications of the stopping rule on non-normal models are also illustrated here.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108089"},"PeriodicalIF":1.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}