{"title":"Maximum likelihood estimation of short panel autoregressive models with flexible form of fixed effects","authors":"Kazuhiko Hayakawa, Boyan Yin","doi":"10.1016/j.jspi.2024.106252","DOIUrl":"10.1016/j.jspi.2024.106252","url":null,"abstract":"<div><div>This paper proposes the maximum likelihood (ML) estimator for a short panel autoregressive model with a flexible form of observed factors as well as unknown interactive fixed effects. We show that the ML estimator is consistent and asymptotically normally distributed as the number of cross-sectional units increases with the number of time periods being fixed. It should be noted that this asymptotic result holds uniformly for the autoregressive coefficient less than, equal to, or greater than one, in sharp contrast to existing estimators. Monte Carlo simulation results show that the ML estimator has desirable finite sample properties.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106252"},"PeriodicalIF":0.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outcome dependent subsampling divide and conquer in generalized linear models for massive data","authors":"Jie Yin , Jieli Ding , Changming Yang","doi":"10.1016/j.jspi.2024.106253","DOIUrl":"10.1016/j.jspi.2024.106253","url":null,"abstract":"<div><div>In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106253"},"PeriodicalIF":0.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric estimators of inequality curves and inequality measures","authors":"Alicja Jokiel-Rokita, Sylwester Pia̧tek","doi":"10.1016/j.jspi.2024.106251","DOIUrl":"10.1016/j.jspi.2024.106251","url":null,"abstract":"<div><div>Classical inequality curves and inequality measures are defined for distributions with finite mean value. Moreover, their empirical counterparts are not resistant to outliers. For these reasons, quantile versions of known inequality curves such as the Lorenz, Bonferroni, Zenga and <span><math><mi>D</mi></math></span> curves, and quantile versions of inequality measures such as the Gini, Bonferroni, Zenga and <span><math><mi>D</mi></math></span> indices have been proposed in the literature. We propose various nonparametric estimators of quantile versions of inequality curves and inequality measures, prove their consistency, and compare their accuracy in a simulation study. We also give examples of the use of quantile versions of inequality measures in real data analysis.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106251"},"PeriodicalIF":0.8,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation and group-feature selection in sparse mixture-of-experts with diverging number of parameters","authors":"Abbas Khalili , Archer Yi Yang , Xiaonan Da","doi":"10.1016/j.jspi.2024.106250","DOIUrl":"10.1016/j.jspi.2024.106250","url":null,"abstract":"<div><div>Mixture-of-experts provide flexible statistical models for a wide range of regression (supervised learning) problems. Often a large number of covariates (features) are available in many modern applications yet only a small subset of them is useful in explaining a response variable of interest. This calls for a feature selection device. In this paper, we present new group-feature selection and estimation methods for sparse mixture-of-experts models when the number of features can be nearly comparable to the sample size. We prove the consistency of the methods in both parameter estimation and feature selection. We implement the methods using a modified EM algorithm combined with proximal gradient method which results in a convenient closed-form parameter update in the M-step of the algorithm. We examine the finite-sample performance of the methods through simulations, and demonstrate their applications in a real data example on exploring relationships in body measurements.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106250"},"PeriodicalIF":0.8,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and testing for endpoint-inflated count time series with bounded support","authors":"Yao Kang , Xiaojing Fan , Jie Zhang , Ying Tang","doi":"10.1016/j.jspi.2024.106248","DOIUrl":"10.1016/j.jspi.2024.106248","url":null,"abstract":"<div><div>Count time series with bounded support frequently exhibit binomial overdispersion, zero inflation and right-endpoint inflation in practical scenarios. Numerous models have been proposed for the analysis of bounded count time series with binomial overdispersion and zero inflation, yet right-endpoint inflation has received comparatively less attention. To better capture these features, this article introduces three versions of extended first-order binomial autoregressive (BAR(1)) models with endpoint inflation. Corresponding stochastic properties of the new models are investigated and model parameters are estimated by the conditional maximum likelihood and quasi-maximum likelihood methods. A binomial right-endpoint inflation index is also constructed and further used to test whether the data set has endpoint-inflated characteristic with respect to a BAR(1) process. Finally, the proposed models are applied to two real data examples. Firstly, we illustrate the usefulness of the proposed models through an application to the voting data on supporting interest rate changes during consecutive monthly meetings of the Monetary Policy Council at the National Bank of Poland. Then, we apply the proposed models to the number of police stations that received at least one drunk driving report per month. The results of the two real data examples indicate that the new models have significant advantages in terms of fitting performance for the bounded count time series with endpoint inflation.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106248"},"PeriodicalIF":0.8,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-parametric empirical likelihood inference on quantile difference between two samples with length-biased and right-censored data","authors":"Li Xun , Xin Guan , Yong Zhou","doi":"10.1016/j.jspi.2024.106249","DOIUrl":"10.1016/j.jspi.2024.106249","url":null,"abstract":"<div><div>Exploring quantile differences between two populations at various probability levels offers valuable insights into their distinctions, which are essential for practical applications such as assessing treatment effects. However, estimating these differences can be challenging due to the complex data often encountered in clinical trials. This paper assumes that right-censored data and length-biased right-censored data originate from two populations of interest. We propose an adjusted smoothed empirical likelihood (EL) method for inferring quantile differences and establish the asymptotic properties of the proposed estimators. Under mild conditions, we demonstrate that the adjusted log-EL ratio statistics asymptotically follow the standard chi-squared distribution. We construct confidence intervals for the quantile differences using both normal and chi-squared approximations and develop a likelihood ratio test for these differences. The performance of our proposed methods is illustrated through simulation studies. Finally, we present a case study utilizing Oscar award nomination data to demonstrate the application of our method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106249"},"PeriodicalIF":0.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sieve estimation of the accelerated mean model based on panel count data","authors":"Xiaoyang Li , Zhi-Sheng Ye , Xingqiu Zhao","doi":"10.1016/j.jspi.2024.106247","DOIUrl":"10.1016/j.jspi.2024.106247","url":null,"abstract":"<div><div>Panel count data are gathered when subjects are examined at discrete times during a study, and only the number of recurrent events occurring before each examination time is recorded. We consider a semiparametric accelerated mean model for panel count data in which the effect of the covariates is to transform the time scale of the baseline mean function. Semiparametric inference for the model is inherently challenging because the finite-dimensional regression parameters appear in the argument of the (infinite-dimensional) functional parameter, i.e., the baseline mean function, leading to the phenomenon of bundled parameters. We propose sieve pseudolikelihood and likelihood methods to construct the random criterion function for estimating the model parameters. An inexact block coordinate ascent algorithm is used to obtain these estimators. We establish the consistency and rate of convergence of the proposed estimators, as well as the asymptotic normality of the estimators of the regression parameters. Novel consistent estimators of the asymptotic covariances of the estimated regression parameters are derived by leveraging the counting process associated with the examination times. Comprehensive simulation studies demonstrate that the optimization algorithm is much less sensitive to the initial values than the Newton–Raphson method. The proposed estimators perform well for practical sample sizes, and are more efficient than existing methods. An example based on real data shows that due to this efficiency gain, the proposed method is better able to detect the significance of practically meaningful covariates than an existing method.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"237 ","pages":"Article 106247"},"PeriodicalIF":0.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The proximal bootstrap for constrained estimators","authors":"Jessie Li","doi":"10.1016/j.jspi.2024.106245","DOIUrl":"10.1016/j.jspi.2024.106245","url":null,"abstract":"<div><div>We demonstrate how to conduct uniformly asymptotically valid inference for <span><math><msqrt><mrow><mi>n</mi></mrow></msqrt></math></span>-consistent estimators defined as the solution to a constrained optimization problem with a possibly nonsmooth or nonconvex sample objective function and a possibly nonconvex constraint set. We allow for the solution to the problem to be on the boundary of the constraint set or to drift towards the boundary of the constraint set as the sample size goes to infinity. We construct a confidence set by benchmarking a test statistic against critical values that can be obtained from a simple unconstrained quadratic programming problem. Monte Carlo simulations illustrate the uniformly correct coverage of our method in a boundary constrained maximum likelihood model, a boundary constrained nonsmooth GMM model, and a conditional logit model with capacity constraints.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106245"},"PeriodicalIF":0.8,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing the equality of distributions using integrated maximum mean discrepancy","authors":"Tianxuan Ding , Zhimei Li , Yaowu Zhang","doi":"10.1016/j.jspi.2024.106246","DOIUrl":"10.1016/j.jspi.2024.106246","url":null,"abstract":"<div><div>Comparing and testing for the homogeneity of two independent random samples is a fundamental statistical problem with many applications across various fields. However, existing methods may not be effective when the data is complex or high-dimensional. We propose a new method that integrates the maximum mean discrepancy (MMD) with a Gaussian kernel over all one-dimensional projections of the data. We derive the closed-form expression of the integrated MMD and prove its validity as a distributional similarity metric. We estimate the integrated MMD with the <span><math><mi>U</mi></math></span>-statistic theory and study its asymptotic behaviors under the null and two kinds of alternative hypotheses. We demonstrate that our method has the benefits of the MMD, and outperforms existing methods on both synthetic and real datasets, especially when the data is complex and high-dimensional.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106246"},"PeriodicalIF":0.8,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semiparametric estimation of a principal functional coefficient panel data model with cross-sectional dependence and its application to cigarette demand","authors":"Yan-Yong Zhao , Ling-Ling Ge , Kong-Sheng Zhang","doi":"10.1016/j.jspi.2024.106244","DOIUrl":"10.1016/j.jspi.2024.106244","url":null,"abstract":"<div><div>In this paper, we consider the estimation of functional coefficient panel data models with cross-sectional dependence. Borrowing the principal component structure, the functional coefficient panel data models can be transformed into a semiparametric panel data model. Combining the local linear dummy variable technique and profile least squares method, we develop a semiparametric profile method to estimate the coefficient functions. A gradient-descent iterative algorithm is employed to enhance computation speed and estimation accuracy. The main results show that the resulting parameter estimator enjoys asymptotic normality with a <span><math><msqrt><mrow><mi>N</mi><mi>T</mi></mrow></msqrt></math></span> convergence rate and the nonparametric estimator is asymptotically normal with a nonparametric convergence rate <span><math><msqrt><mrow><mi>N</mi><mi>T</mi><mi>h</mi></mrow></msqrt></math></span> when both the number of cross-sectional units <span><math><mi>N</mi></math></span> and the length of time series <span><math><mi>T</mi></math></span> go to infinity, under some regularity conditions. Monte Carlo simulations are carried out to evaluate the proposed methods, and an application to cigarette demand is investigated for illustration.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106244"},"PeriodicalIF":0.8,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142416590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}