Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy, Ichiro Takeuchi
{"title":"Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation","authors":"Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy, Ichiro Takeuchi","doi":"10.1007/s10463-022-00846-2","DOIUrl":"10.1007/s10463-022-00846-2","url":null,"abstract":"<div><p>In this paper, we consider conditional selective inference (SI) for a linear model estimated after outliers are removed from the data. To apply the conditional SI framework, it is necessary to characterize the events of how the robust method identifies outliers. Unfortunately, the existing conditional SIs cannot be directly applied to our problem because they are applicable to the case where the selection events can be represented by linear or quadratic constraints. We propose a conditional SI method for popular robust regressions such as least-absolute-deviation regression and Huber regression by introducing a new computational method using a convex optimization technique called homotopy method. We show that the proposed conditional SI method is applicable to a wide class of robust regression and outlier detection methods and has good empirical performance on both synthetic data and real data experiments.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"74 6","pages":"1197 - 1228"},"PeriodicalIF":1.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46257738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Baillien, Irène Gijbels, Anneleen Verhasselt
{"title":"Flexible asymmetric multivariate distributions based on two-piece univariate distributions","authors":"Jonas Baillien, Irène Gijbels, Anneleen Verhasselt","doi":"10.1007/s10463-022-00842-6","DOIUrl":"10.1007/s10463-022-00842-6","url":null,"abstract":"<div><p>Classical symmetric distributions like the Gaussian are widely used. However, in reality data often display a lack of symmetry. Multiple distributions, grouped under the name “skewed distributions”, have been developed to specifically cope with asymmetric data. In this paper, we present a broad family of flexible multivariate skewed distributions for which statistical inference is a feasible task. The studied family of multivariate skewed distributions is derived by taking affine combinations of independent univariate distributions. These are members of a flexible family of univariate asymmetric distributions and are an important basis for achieving statistical inference. Besides basic properties of the proposed distributions, also statistical inference based on a maximum likelihood approach is presented. We show that under mild conditions, weak consistency and asymptotic normality of the maximum likelihood estimators hold. These results are supported by a simulation study confirming the developed theoretical results, and some data examples to illustrate practical applicability.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 1","pages":"159 - 200"},"PeriodicalIF":1.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48406413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the choice of the optimal single order statistic in quantile estimation","authors":"Mariusz Bieniek, Luiza Pańczyk","doi":"10.1007/s10463-022-00845-3","DOIUrl":"10.1007/s10463-022-00845-3","url":null,"abstract":"<div><p>We study the classical statistical problem of the estimation of quantiles by order statistics of the random sample. For fixed sample size, we determine the single order statistic which is the optimal estimator of a quantile of given order. We propose a totally new approach to the problem, since our optimality criterion is based on the use of nonparametric sharp upper and lower bounds on the bias of the estimation. First, we determine the explicit analytic expressions for the bounds, and then, we choose the order statistic for which the upper and lower bound are simultaneously as close to 0 as possible. The paper contains rigorously proved theoretical results which can be easily implemented in practise. This is also illustrated with numerical examples.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 2","pages":"303 - 333"},"PeriodicalIF":1.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42659546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective inference after feature selection via multiscale bootstrap","authors":"Yoshikazu Terada, Hidetoshi Shimodaira","doi":"10.1007/s10463-022-00838-2","DOIUrl":"10.1007/s10463-022-00838-2","url":null,"abstract":"<div><p>It is common to show the confidence intervals or <i>p</i>-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective <i>p</i>-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the <i>p</i>-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 1","pages":"99 - 125"},"PeriodicalIF":1.0,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43509814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference using an exact distribution of test statistic for random-effects meta-analysis","authors":"Keisuke Hanada, Tomoyuki Sugimoto","doi":"10.1007/s10463-022-00844-4","DOIUrl":"10.1007/s10463-022-00844-4","url":null,"abstract":"<div><p>Random-effects meta-analysis serves to integrate the results of multiple studies with methods such as moment estimation and likelihood estimation duly proposed. These existing methods are based on asymptotic normality with respect to the number of studies. However, the test and interval estimation deviate from the nominal significance level when integrating a small number of studies. Although a method for constructing more conservative intervals has been recently proposed, the exact distribution of test statistic for the overall treatment effect is not well known. In this paper, we provide an almost-exact distribution of the test statistic in random-effects meta-analysis and propose the test and interval estimation using the almost-exact distribution. Simulations demonstrate the accuracy of estimation and application to existing meta-analysis using the method proposed here. With known variance parameters, the estimation performance using the almost-exact distribution always achieves the nominal significance level regardless of the number of studies and heterogeneity. We also propose some methods to construct a conservative interval estimation, even when the variance parameters are unknown, and present their performances via simulation and an application to Alzheimer’s disease meta-analysis.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 2","pages":"281 - 302"},"PeriodicalIF":1.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41358458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group least squares regression for linear models with strongly correlated predictor variables","authors":"Min Tsao","doi":"10.1007/s10463-022-00841-7","DOIUrl":"10.1007/s10463-022-00841-7","url":null,"abstract":"<div><p>Traditionally, the main focus of the least squares regression is to study the effects of individual predictor variables, but strongly correlated variables generate multicollinearity which makes it difficult to study their effects. To resolve the multicollinearity issue without abandoning the least squares regression, for situations where predictor variables are in groups with strong within-group correlations but weak between-group correlations, we propose to study the effects of the groups with a group approach to the least squares regression. Using an all positive correlations arrangement of the strongly correlated variables, we first characterize group effects that are meaningful and can be accurately estimated. We then discuss the group approach to the least squares regression through a simulation study and demonstrate that it is an effective method for handling multicollinearity. We also address a common misconception about prediction accuracy of the least squares estimated model.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 2","pages":"233 - 250"},"PeriodicalIF":1.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46133684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonparametric inference for additive models estimated via simplified smooth backfitting","authors":"Suneel Babu Chatla","doi":"10.1007/s10463-022-00840-8","DOIUrl":"10.1007/s10463-022-00840-8","url":null,"abstract":"<div><p>We investigate hypothesis testing in nonparametric additive models estimated using simplified smooth backfitting (Huang and Yu, Journal of Computational and Graphical Statistics, 28(2), 386–400, 2019). Simplified smooth backfitting achieves oracle properties under regularity conditions and provides closed-form expressions of the estimators that are useful for deriving asymptotic properties. We develop a generalized likelihood ratio (GLR) (Fan, Zhang and Zhang, Annals of statistics, 29(1),153–193, 2001) and a loss function (LF) (Hong and Lee, Annals of Statistics, 41(3), 1166–1203, 2013)-based testing framework for inference. Under the null hypothesis, both the GLR and LF tests have asymptotically rescaled chi-squared distributions, and both exhibit the Wilks phenomenon, which means the scaling constants and degrees of freedom are independent of nuisance parameters. These tests are asymptotically optimal in terms of rates of convergence for nonparametric hypothesis testing. Additionally, the bandwidths that are well suited for model estimation may be useful for testing. We show that in additive models, the LF test is asymptotically more powerful than the GLR test. We use simulations to demonstrate the Wilks phenomenon and the power of these proposed GLR and LF tests, and a real example to illustrate their usefulness.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 1","pages":"71 - 97"},"PeriodicalIF":1.0,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45543788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact statistical inference for the Wasserstein distance by selective inference","authors":"Vo Nguyen Le Duy, Ichiro Takeuchi","doi":"10.1007/s10463-022-00837-3","DOIUrl":"10.1007/s10463-022-00837-3","url":null,"abstract":"<div><p>In this paper, we study statistical inference for the Wasserstein distance, which has attracted much attention and has been applied to various machine learning tasks. Several studies have been proposed in the literature, but almost all of them are based on <i>asymptotic</i> approximation and do <i>not</i> have finite-sample validity. In this study, we propose an <i>exact (non-asymptotic)</i> inference method for the Wasserstein distance inspired by the concept of conditional selective inference (SI). To our knowledge, this is the first method that can provide a valid confidence interval (CI) for the Wasserstein distance with finite-sample coverage guarantee, which can be applied not only to one-dimensional problems but also to multi-dimensional problems. We evaluate the performance of the proposed method on both synthetic and real-world datasets.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 1","pages":"127 - 157"},"PeriodicalIF":1.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43291261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust estimation of the conditional stable tail dependence function","authors":"Yuri Goegebeur, Armelle Guillou, Jing Qin","doi":"10.1007/s10463-022-00839-1","DOIUrl":"10.1007/s10463-022-00839-1","url":null,"abstract":"<div><p>We propose a robust estimator of the stable tail dependence function in the case where random covariates are recorded. Under suitable assumptions, we derive the finite-dimensional weak convergence of the estimator properly normalized. The performance of our estimator in terms of efficiency and robustness is illustrated through a simulation study. Our methodology is applied on a real dataset of sale prices of residential properties.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 2","pages":"201 - 231"},"PeriodicalIF":1.0,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00839-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48094376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation with multivariate outcomes having nonignorable item nonresponse","authors":"Lyu Ni, Jun Shao","doi":"10.1007/s10463-022-00836-4","DOIUrl":"10.1007/s10463-022-00836-4","url":null,"abstract":"<div><p>To estimate unknown population parameters based on <span>({varvec{y}})</span>, a vector of multivariate outcomes having nonignorable item nonresponse that directly depends on <span>({varvec{y}})</span>, we propose an innovative inverse propensity weighting approach when the joint distribution of <span>({varvec{y}})</span> and associated covariate <span>({varvec{x}})</span> is nonparametric and the nonresponse probability conditional on <span>({varvec{y}})</span> and <span>({varvec{x}})</span> has a parametric form. To deal with the identifiability issue, we utilize a nonresponse instrument <span>({varvec{z}})</span>, an auxiliary variable related to <span>({varvec{y}})</span> but not related to the nonresponse probability conditional on <span>({varvec{y}})</span> and <span>({varvec{x}})</span>. We utilize a modified generalized method of moments to obtain estimators of the parameters in the nonresponse probability. Simulation results are presented and an application is illustrated in a real data set.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"75 1","pages":"1 - 15"},"PeriodicalIF":1.0,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46136467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}