{"title":"Variable selection in high-dimensional varying coefficient panel data models with fixed effects","authors":"Yiping Yang , Peixin Zhao","doi":"10.1016/j.jspi.2025.106355","DOIUrl":"10.1016/j.jspi.2025.106355","url":null,"abstract":"<div><div>To address the challenges of variable selection in panel data models with fixed effects and varying coefficients, we introduce a novel method that combines basis function approximations with group nonconcave penalty functions. By utilizing a forward orthogonal deviation transformation, we eliminate fixed effects, allowing us to select significant variables and estimate non-zero coefficient functions. Under certain regularity conditions, we demonstrate that our method consistently identifies the true model structure, and the resulting estimators exhibit oracle properties. For computational efficiency, we have developed a group gradient descent algorithm that incorporates a transformation of the penalty terms. Simulation studies reveal that nonconvex penalties (SCAD/MCP) outperform the Lasso across various performance metrics. Furthermore, compared to existing methods, our approach significantly reduces false positives (FPs). To demonstrate the practical applicability and effectiveness of our method, we present an analysis of a real dataset.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106355"},"PeriodicalIF":0.8,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal inference in early phase clinical trials: Variance decomposition and order of patient inclusion","authors":"Matthieu Clertant , Meliha Akouba , Alexia Iasonos , John O’Quigley","doi":"10.1016/j.jspi.2025.106352","DOIUrl":"10.1016/j.jspi.2025.106352","url":null,"abstract":"<div><div>Causal inference tools, in particular those of variance decomposition, hierarchical data structures and counterfactuals, are applied to the study of the methodology of dose-finding studies in oncology. A detailed variance decomposition brings into a much sharper focus the relative performance of different designs. We develop and present new results on the role played by the order of patient inclusions into a sequential dose-finding study. These results make it clear why, previously, authors could easily be misled into a conclusion that different designs enjoy similar performances. This is not so and we show how to avoid making that mistake. We highlight our findings via both theoretical and numerical studies.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106352"},"PeriodicalIF":0.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The k-sample Behrens-Fisher problem for high-dimensional data with model free assumption","authors":"Yanbo Pei, Xiaoxiao Ren, Baoxue Zhang","doi":"10.1016/j.jspi.2025.106354","DOIUrl":"10.1016/j.jspi.2025.106354","url":null,"abstract":"<div><div>The problem of testing the equality of <em>k</em>-sample mean vectors with different covariance matrices, known as the Behrens-Fisher (BF) problem for <em>k</em>-sample, is a significant issue in statistics. Hu and Bai (2017) proposed a test statistic that operates under a factor-like model structure assumption and demonstrated its normal limit. Building on this work, we further explore the asymptotic properties of the test statistic. We prove that the asymptotic null distribution of the test statistic is a Chi-square-type mixture distribution under a model-free assumption and establish its asymptotic power under a full alternative hypothesis. Moreover, we show that the asymptotic null distribution of the test statistic is either normal or a weighted sum of normal and Chi-square random variables, depending on the convergence rate of the eigenvalues of the covariance matrix with model free assumption. To address practical challenges in high-dimensional data, we propose a new weighted bootstrap procedure that is simple to implement. Simulation studies demonstrate that our proposed test procedure outperforms existing methods in terms of size control under various settings. Furthermore, real data applications illustrate the applicability of our test procedure to a variety of high-dimensional data analysis problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106354"},"PeriodicalIF":0.8,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint distribution of numbers of occurrences of countably many runs of specified lengths in a sequence of discrete random variables","authors":"Kiyoshi Inoue","doi":"10.1016/j.jspi.2025.106353","DOIUrl":"10.1016/j.jspi.2025.106353","url":null,"abstract":"<div><div>In this paper, we consider the joint distribution of numbers of occurrences of countably many runs of several lengths in a sequence of nonnegative integer valued independent and identically distributed random variables through the generating functions. We propose a generalization of the potential partition polynomials, which gives effective computational tools for the derivation of probability functions. The waiting time problems associated with infinitely many runs are investigated and formulae for the evaluation of the generating functions are given. The results presented here provide a wide framework for developing the multivariate distribution theory of runs. Finally, we discuss several applications and numerical examples to show how our theoretical results are applied to the investigation of runs, as well as parameter estimation problems.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106353"},"PeriodicalIF":0.8,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Orthogonal Latin hypercube designs with hidden low-dimensional projection","authors":"Tian-fang Zhang , Yue-ru Yan , Fasheng Sun","doi":"10.1016/j.jspi.2025.106349","DOIUrl":"10.1016/j.jspi.2025.106349","url":null,"abstract":"<div><div>Orthogonal Latin hypercube designs are widely used in computer experiments because of their attractive properties. In this article, we develop a new grouping method to construct such designs. Compared to the existing results, the new constructed designs can accommodate more factors with the same runsize, which means they are more cost-effective. Moreover, the resulting designs possess not only orthogonality, but also appealing space-filling properties in low dimensions, which make them very suitable for computer experiments.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106349"},"PeriodicalIF":0.8,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and computationally efficient gradient-based estimation","authors":"Yibo Yan , Xiaozhou Wang , Riquan Zhang","doi":"10.1016/j.jspi.2025.106351","DOIUrl":"10.1016/j.jspi.2025.106351","url":null,"abstract":"<div><div>In this paper, we propose a class of estimators based on the robust and computationally efficient gradient estimation for both low- and high-dimensional risk minimization framework. The gradient estimation in this work is constructed using a series of newly proposed univariate robust and efficient mean estimators. Our proposed estimators are obtained iteratively using a variant of the gradient descent method, where the update direction is determined by a robust and computationally efficient gradient. These estimators not only have explicit expressions and can be obtained through arithmetic operations but are also robust to arbitrary outliers in common statistical models. Theoretically, we establish the convergence of the algorithms and derive non-asymptotic error bounds for these iterative estimators. Specifically, we apply our methods to linear and logistic regression models, achieving robust parameter estimates and corresponding excess risk bounds. Unlike previous work, our theoretical results rely on a magnitude function of the outliers, which captures the extent of their deviation from the inliers. Finally, we present extensive simulation experiments on both low- and high-dimensional linear models to demonstrate the superior performance of our proposed estimators compared to several baseline methods.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106351"},"PeriodicalIF":0.8,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila
{"title":"Assessing goodness-of-fit for sparse categories using Rényi divergence","authors":"Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila","doi":"10.1016/j.jspi.2025.106350","DOIUrl":"10.1016/j.jspi.2025.106350","url":null,"abstract":"<div><div>We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106350"},"PeriodicalIF":0.8,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Ye , Jingsong Xiao , Weidong Ma , Yulai Miao , Ying Yang
{"title":"Consistent community detection approach in the nonparametric weighted stochastic blockmodel with unspecified number of communities","authors":"Fei Ye , Jingsong Xiao , Weidong Ma , Yulai Miao , Ying Yang","doi":"10.1016/j.jspi.2025.106339","DOIUrl":"10.1016/j.jspi.2025.106339","url":null,"abstract":"<div><div>The stochastic blockmodel (SBM) is a widely used model for representing graphs. Numerous approaches have been applied to the SBM to detect latent community structures in graphs, typically using two types of consistency (strong and weak) to evaluate their performance. Most of these methods have been studied and shown to be consistent under the SBM framework. However, the consistency of the weighted SBM, an important extension of the SBM, has been largely overlooked. Moreover, few approaches are capable of detecting communities when the number of communities is unknown. In this paper, we propose a nonparametric method for effective community detection under the assortative, nonparametric weighted SBM with an unknown number of communities, and we establish the consistency of our approach. We introduce a novel concept, “consistency in relationship”, as a more practical criterion to assess the performance of community detection algorithms. Since solving the optimization problem in our approach becomes intractable for large sample sizes, we propose an efficient algorithm to approximate it. Simulations demonstrate that our community detection method is both efficient and robust, particularly for unbalanced networks. We illustrate the effectiveness of our approach on three real-world networks.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106339"},"PeriodicalIF":0.8,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"D-criterion based optimal subsampling in Poisson regression with one covariate","authors":"Torsten Glemser, Rainer Schwabe","doi":"10.1016/j.jspi.2025.106340","DOIUrl":"10.1016/j.jspi.2025.106340","url":null,"abstract":"<div><div>The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally <span><math><mi>D</mi></math></span>-optimal subsampling designs under a Poisson regression model with a log link in one covariate. A representation of the support of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is established. We make statements on scale-location transformations of the covariate that require a simultaneous transformation of the regression parameter. The performance of the methods is demonstrated by illustrating examples. To show the advantage of the optimal subsampling designs, we examine the efficiency of uniform random subsampling as well as of two heuristic designs. Further, the efficiency of locally <span><math><mi>D</mi></math></span>-optimal subsampling designs is studied when the parameter is misspecified.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106340"},"PeriodicalIF":0.8,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145118484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured regularization covariance estimation in tensor-valued data analysis","authors":"Jiangyan Wang, Yang Ren, Jinguan Lin","doi":"10.1016/j.jspi.2025.106337","DOIUrl":"10.1016/j.jspi.2025.106337","url":null,"abstract":"<div><div>Covariance estimation poses a crucial challenge in high-dimensional data analysis, especially when traditional methods (e.g., sample covariance) are inaccurate, particularly with small sample sizes. A promising solution is to exploit inherent data structures such as low-rankness, sparsity, or smoothness. For tensor data (multi-dimensional arrays), structured regularization aids in dimensionality reduction. This paper introduces novel regularization methods for tensor covariance estimation, specifically applying banded and tapering structures to the covariance matrix. We use Kronecker Product Canonical Polyadic (KPCP) decomposition to approximate large matrices via the Kronecker product of smaller matrices. A split resampling scheme is employed to select parameters for the KPCP decomposition from noisy data. This leads to two methods: KPCP-TB-R (Triply Banded-Resampling) and KPCP-TT-R (Triply Tapering-Resampling). Additionally, sparse (thresholding) and multi-structured regularization approaches are introduced for comparison. The effectiveness and robustness of the proposed methods are validated through extensive simulations and applied to monthly export trade volume data.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106337"},"PeriodicalIF":0.8,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}