{"title":"Statistical inference using regularized M-estimation in the reproducing kernel Hilbert space for handling missing data","authors":"Hengfang Wang, Jae Kwang Kim","doi":"10.1007/s10463-023-00872-8","DOIUrl":"10.1007/s10463-023-00872-8","url":null,"abstract":"<div><p>Imputation is a popular technique for handling missing data. We address a nonparametric imputation using the regularized M-estimation techniques in the reproducing kernel Hilbert space. Specifically, we first use kernel ridge regression to develop imputation for handling item nonresponse. Although this nonparametric approach is potentially promising for imputation, its statistical properties are not investigated in the literature. Under some conditions on the order of the tuning parameter, we first establish the root-<i>n</i> consistency of the kernel ridge regression imputation estimator and show that it achieves the lower bound of the semiparametric asymptotic variance. A nonparametric propensity score estimator using the reproducing kernel Hilbert space is also developed by the linear expression of the projection estimator. We show that the resulting propensity score estimator is asymptotically equivalent to the kernel ridge regression imputation estimator. Results from a limited simulation study are also presented to confirm our theory. The proposed method is applied to analyze air pollution data measured in Beijing, China.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-023-00872-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48637382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A goodness-of-fit test on the number of biclusters in a relational data matrix","authors":"Chihiro Watanabe, Taiji Suzuki","doi":"10.1007/s10463-023-00869-3","DOIUrl":"10.1007/s10463-023-00869-3","url":null,"abstract":"<div><p>Biclustering is a method for detecting homogeneous submatrices in a given matrix. Although there are many studies that estimate the underlying bicluster structure of a matrix, few have enabled us to determine the appropriate number of biclusters. Recently, a statistical test on the number of biclusters has been proposed for a regular-grid bicluster structure. However, when the latent bicluster structure does not satisfy such regular-grid assumption, the previous test requires a larger number of biclusters than necessary for the null hypothesis to be accepted, which is not desirable in terms of interpreting the accepted structure. In this study, we propose a new statistical test on the number of biclusters that does not require the regular-grid assumption and derive the asymptotic behavior of the proposed test statistic in both null and alternative cases. We illustrate the effectiveness of the proposed method by applying it to both synthetic and practical data matrices.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42497102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gene–environment interaction analysis under the Cox model","authors":"Kuangnan Fang, Jingmao Li, Yaqing Xu, Shuangge Ma, Qingzhao Zhang","doi":"10.1007/s10463-023-00871-9","DOIUrl":"10.1007/s10463-023-00871-9","url":null,"abstract":"<div><p>For the survival of cancer and many other complex diseases, gene–environment (G-E) interactions have been established as having essential importance. G-E interaction analysis can be roughly classified as marginal and joint, depending on the number of G variables analyzed at a time. In this study, we focus on joint analysis, which can better reflect disease biology and is statistically more challenging. Many approaches have been developed for joint G-E interaction analysis for survival outcomes and led to important findings. However, without rigorous statistical development, quite a few methods have a weak theoretical ground. To fill this knowledge gap, in this article, we consider joint G-E interaction analysis under the Cox model. Sparse group penalization is adopted for regularizing estimation and selecting important main effects and interactions. The “main effects, interactions” variable selection hierarchy, which has been strongly advocated in recent literature, is satisfied. Significantly advancing from some published studies, we rigorously establish the consistency properties under high dimensionality. An effective computational algorithm is developed, simulation demonstrates competitive performance of the proposed approach, and analysis of The Cancer Genome Atlas (TCGA) data on stomach adenocarcinoma (STAD) further demonstrates its practical utility.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-023-00871-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42111901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parametric estimation of spatial–temporal point processes using the Stoyan–Grabarnik statistic","authors":"Conor Kresin, Frederic Schoenberg","doi":"10.1007/s10463-023-00866-6","DOIUrl":"10.1007/s10463-023-00866-6","url":null,"abstract":"<div><p>A novel estimator for the parameters governing spatial–temporal point processes is proposed. Unlike the maximum likelihood estimator, the proposed estimator is fast and easy to compute, and does not require the computation or approximation of a computationally expensive integral. This parametric estimator is based on the Stoyan–Grabarnik (sum of inverse intensity) statistic and is shown to be consistent, under quite general conditions. Simulations are presented demonstrating the performance of the estimator.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41313716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic data-based bin width selection for rose diagram","authors":"Yasuhito Tsuruta, Masahiko Sagae","doi":"10.1007/s10463-023-00868-4","DOIUrl":"10.1007/s10463-023-00868-4","url":null,"abstract":"<div><p>A rose diagram is a representation that circularly organizes data with the bin width as the central angle. This diagram is widely used to display and summarize circular data. Some studies have proposed the selector of bin width based on data. However, only a few papers have discussed the property of these selectors from a statistical perspective. Thus, this study aims to provide a data-based bin width selector for rose diagrams using a statistical approach. We consider that the radius of the rose diagram is a nonparametric estimator of the square root of two times the circular density. We derive the mean integrated square error of the rose diagram and its optimal bin width and propose two new selectors: normal reference rule and biased cross-validation. We show that biased cross-validation converges to its optimizer. Additionally, we propose a polygon rose diagram to enhance the rose diagram.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-023-00868-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47513957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixture of shifted binomial distributions for rating data","authors":"Shaoting Li, Jiahua Chen","doi":"10.1007/s10463-023-00865-7","DOIUrl":"10.1007/s10463-023-00865-7","url":null,"abstract":"<div><p>Rating data are a kind of ordinal categorical data routinely collected in survey sampling. The response value in such applications is confined to a finite number of ordered categories. Due to population heterogeneity, the respondents may have several different rating styles. A finite mixture model is thus most suitable to fit datasets of this nature. In this paper, we propose a two-component mixture of shifted binomial distributions for rating data. We show that this model is identifiable and propose a numerically stable penalized likelihood approach for parameter estimation. We adapt an expectation-maximization algorithm for the penalized maximum likelihood estimation. Our simulation results show that the penalized maximum likelihood estimator is consistent and effective. We fit the proposed model and other models in the literature to some real-world datasets and find the proposed model can have much better fits.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43469802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Least absolute deviation estimation for AR(1) processes with roots close to unity","authors":"Nannan Ma, Hailin Sang, Guangyu Yang","doi":"10.1007/s10463-022-00864-0","DOIUrl":"10.1007/s10463-022-00864-0","url":null,"abstract":"<div><p>We establish the asymptotic theory of least absolute deviation estimators for AR(1) processes with autoregressive parameter satisfying <span>(n(rho _n-1)rightarrow gamma)</span> for some fixed <span>(gamma)</span> as <span>(nrightarrow infty)</span>, which is parallel to the results of ordinary least squares estimators developed by Andrews and Guggenberger (Journal of Time Series Analysis, 29, 203–212, 2008) in the case <span>(gamma = 0)</span> or Chan and Wei (Annals of Statistics, 15, 1050–1063, 1987) and Phillips (Biometrika, 74, 535–574, 1987) in the case <span>(gamma ne 0)</span>. Simulation experiments are conducted to confirm the theoretical results and to demonstrate the robustness of the least absolute deviation estimation.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46897034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric multiple regression by projection on non-compactly supported bases","authors":"Florian Dussap","doi":"10.1007/s10463-022-00863-1","DOIUrl":"10.1007/s10463-022-00863-1","url":null,"abstract":"<div><p>We study the nonparametric regression estimation problem with a random design in <span>({mathbb{R}}^{p})</span> with <span>(pge 2)</span>. We do so by using a projection estimator obtained by least squares minimization. Our contribution is to consider non-compact estimation domains in <span>({mathbb {R}}^{p})</span>, on which we recover the function, and to provide a theoretical study of the risk of the estimator relative to a norm weighted by the distribution of the design. We propose a model selection procedure in which the model collection is random and takes into account the discrepancy between the empirical norm and the norm associated with the distribution of design. We prove that the resulting estimator automatically optimizes the bias-variance trade-off in both norms, and we illustrate the numerical performance of our procedure on simulated data.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47801701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhijit Mandal, Beste Hamiye Beyaztas, Soutir Bandyopadhyay
{"title":"Robust density power divergence estimates for panel data models","authors":"Abhijit Mandal, Beste Hamiye Beyaztas, Soutir Bandyopadhyay","doi":"10.1007/s10463-022-00862-2","DOIUrl":"10.1007/s10463-022-00862-2","url":null,"abstract":"<div><p>The panel data regression models have become one of the most widely applied statistical approaches in different fields of research, including social, behavioral, environmental sciences, and econometrics. However, traditional least-squares-based techniques frequently used for panel data models are vulnerable to the adverse effects of data contamination or outlying observations that may result in biased and inefficient estimates and misleading statistical inference. In this study, we propose a <i>minimum density power divergence</i> estimation procedure for panel data regression models with random effects to achieve robustness against outliers. The robustness, as well as the asymptotic properties of the proposed estimator, are rigorously established. The finite-sample properties of the proposed method are investigated through an extensive simulation study and an application to climate data in Oman. Our results demonstrate that the proposed estimator exhibits improved performance over some traditional and robust methods in the presence of data contamination.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43954223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Group least squares regression for linear models with strongly correlated predictor variables","authors":"Min Tsao","doi":"10.1007/s10463-022-00861-3","DOIUrl":"10.1007/s10463-022-00861-3","url":null,"abstract":"","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42184841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}