Amina Shahzadi, Ting Wang, Mark Bebbington, Matthew Parry
{"title":"Inhomogeneous hidden semi-Markov models for incompletely observed point processes","authors":"Amina Shahzadi, Ting Wang, Mark Bebbington, Matthew Parry","doi":"10.1007/s10463-022-00843-5","DOIUrl":"10.1007/s10463-022-00843-5","url":null,"abstract":"<div><p>A general class of inhomogeneous hidden semi-Markov models (IHSMMs) is proposed for modelling partially observed processes that do not necessarily behave in a stationary and memoryless manner. The key feature of the proposed model is that the sojourn times of the states in the semi-Markov chain are time-dependent, making it an inhomogeneous semi-Markov chain. Conjectured consistency of the parameter estimators is checked by simulation study using direct numerical optimization of the log-likelihood function. The proposed models are applied to a global volcanic eruption catalogue to investigate the time-dependent incompleteness of the record by introducing a particular case of IHSMMs with time-dependent shifted Poisson state durations and a renewal process as the observed process. The Akaike Information Criterion and residual analysis are used to choose the best model. The selected IHSMM provides useful insights into the completeness of the global record of volcanic eruptions, demonstrating the effectiveness of this method.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44617410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression analysis for exponential family data in a finite population setup using two-stage cluster sample","authors":"Brajendra C. Sutradhar","doi":"10.1007/s10463-022-00850-6","DOIUrl":"10.1007/s10463-022-00850-6","url":null,"abstract":"<div><p>Over the last four decades, the cluster regression analysis in a finite population (FP) setup for an exponential family such as linear or binary data was done by using a two-stage cluster sample chosen from the FP but by treating the sample as though it is a single-stage cluster sample from a super-population (SP) which contains the FP as a hypothetical sample. Because the responses within a cluster in the FP are correlated, the aforementioned sample mis-specification makes the sample-based so-called GLS (generalized least square) estimators design biased and inconsistent. In this paper, we demonstrate for the exponential family data how to avoid the sampling mis-specification and accommodate the cluster correlations to obtain unbiased and consistent estimates for the FP parameters. The asymptotic normality of the regression estimators is also given for the construction of confidence intervals when needed.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46263827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic theory in network models with covariates and a growing number of node parameters","authors":"Qiuping Wang, Yuan Zhang, Ting Yan","doi":"10.1007/s10463-022-00848-0","DOIUrl":"10.1007/s10463-022-00848-0","url":null,"abstract":"<div><p>We propose a general model that jointly characterizes degree heterogeneity and homophily in weighted, undirected networks. We present a moment estimation method using node degrees and homophily statistics. We establish consistency and asymptotic normality of our estimator using novel analysis. We apply our general framework to three applications, including both exponential family and non-exponential family models. Comprehensive numerical studies and a data example also demonstrate the usefulness of our method.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48524750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative robustness of instance ranking problems","authors":"Tino Werner","doi":"10.1007/s10463-022-00847-1","DOIUrl":"10.1007/s10463-022-00847-1","url":null,"abstract":"<div><p>Instance ranking problems intend to recover the ordering of the instances in a data set with applications in scientific, social and financial contexts. In this work, we concentrate on the global robustness of parametric instance ranking problems in terms of the breakdown point which measures the fraction of samples that need to be perturbed in order to let the estimator take unreasonable values. Existing breakdown point notions do not cover ranking problems so far. We propose to define a breakdown of the estimator as a sign-reversal of all components which causes the predicted ranking to be potentially completely inverted; therefore, we call it the order-inversal breakdown point (OIBDP). We will study the OIBDP, based on a linear model, for several different carefully distinguished ranking problems and provide least favorable outlier configurations, characterizations of the order-inversal breakdown point and sharp asymptotic upper bounds. We also compute empirical OIBDPs.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00847-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42157643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forward variable selection for ultra-high dimensional quantile regression models","authors":"Toshio Honda, Chien-Tong Lin","doi":"10.1007/s10463-022-00849-z","DOIUrl":"10.1007/s10463-022-00849-z","url":null,"abstract":"<div><p>We propose forward variable selection procedures with a stopping rule for feature screening in ultra-high-dimensional quantile regression models. For such very large models, penalized methods do not work and some preliminary feature screening is necessary. We demonstrate the desirable theoretical properties of our forward procedures by taking care of uniformity w.r.t. subsets of covariates properly. The necessity of such uniformity is often overlooked in the literature. Our stopping rule suitably incorporates the model size at each stage. We also present the results of simulation studies and a real data application to show their good finite sample performances.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10463-022-00849-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41794639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy, Ichiro Takeuchi
{"title":"Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation","authors":"Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy, Ichiro Takeuchi","doi":"10.1007/s10463-022-00846-2","DOIUrl":"10.1007/s10463-022-00846-2","url":null,"abstract":"<div><p>In this paper, we consider conditional selective inference (SI) for a linear model estimated after outliers are removed from the data. To apply the conditional SI framework, it is necessary to characterize the events of how the robust method identifies outliers. Unfortunately, the existing conditional SIs cannot be directly applied to our problem because they are applicable to the case where the selection events can be represented by linear or quadratic constraints. We propose a conditional SI method for popular robust regressions such as least-absolute-deviation regression and Huber regression by introducing a new computational method using a convex optimization technique called homotopy method. We show that the proposed conditional SI method is applicable to a wide class of robust regression and outlier detection methods and has good empirical performance on both synthetic data and real data experiments.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46257738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Baillien, Irène Gijbels, Anneleen Verhasselt
{"title":"Flexible asymmetric multivariate distributions based on two-piece univariate distributions","authors":"Jonas Baillien, Irène Gijbels, Anneleen Verhasselt","doi":"10.1007/s10463-022-00842-6","DOIUrl":"10.1007/s10463-022-00842-6","url":null,"abstract":"<div><p>Classical symmetric distributions like the Gaussian are widely used. However, in reality data often display a lack of symmetry. Multiple distributions, grouped under the name “skewed distributions”, have been developed to specifically cope with asymmetric data. In this paper, we present a broad family of flexible multivariate skewed distributions for which statistical inference is a feasible task. The studied family of multivariate skewed distributions is derived by taking affine combinations of independent univariate distributions. These are members of a flexible family of univariate asymmetric distributions and are an important basis for achieving statistical inference. Besides basic properties of the proposed distributions, also statistical inference based on a maximum likelihood approach is presented. We show that under mild conditions, weak consistency and asymptotic normality of the maximum likelihood estimators hold. These results are supported by a simulation study confirming the developed theoretical results, and some data examples to illustrate practical applicability.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48406413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the choice of the optimal single order statistic in quantile estimation","authors":"Mariusz Bieniek, Luiza Pańczyk","doi":"10.1007/s10463-022-00845-3","DOIUrl":"10.1007/s10463-022-00845-3","url":null,"abstract":"<div><p>We study the classical statistical problem of the estimation of quantiles by order statistics of the random sample. For fixed sample size, we determine the single order statistic which is the optimal estimator of a quantile of given order. We propose a totally new approach to the problem, since our optimality criterion is based on the use of nonparametric sharp upper and lower bounds on the bias of the estimation. First, we determine the explicit analytic expressions for the bounds, and then, we choose the order statistic for which the upper and lower bound are simultaneously as close to 0 as possible. The paper contains rigorously proved theoretical results which can be easily implemented in practise. This is also illustrated with numerical examples.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42659546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective inference after feature selection via multiscale bootstrap","authors":"Yoshikazu Terada, Hidetoshi Shimodaira","doi":"10.1007/s10463-022-00838-2","DOIUrl":"10.1007/s10463-022-00838-2","url":null,"abstract":"<div><p>It is common to show the confidence intervals or <i>p</i>-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective <i>p</i>-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the <i>p</i>-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43509814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference using an exact distribution of test statistic for random-effects meta-analysis","authors":"Keisuke Hanada, Tomoyuki Sugimoto","doi":"10.1007/s10463-022-00844-4","DOIUrl":"10.1007/s10463-022-00844-4","url":null,"abstract":"<div><p>Random-effects meta-analysis serves to integrate the results of multiple studies with methods such as moment estimation and likelihood estimation duly proposed. These existing methods are based on asymptotic normality with respect to the number of studies. However, the test and interval estimation deviate from the nominal significance level when integrating a small number of studies. Although a method for constructing more conservative intervals has been recently proposed, the exact distribution of test statistic for the overall treatment effect is not well known. In this paper, we provide an almost-exact distribution of the test statistic in random-effects meta-analysis and propose the test and interval estimation using the almost-exact distribution. Simulations demonstrate the accuracy of estimation and application to existing meta-analysis using the method proposed here. With known variance parameters, the estimation performance using the almost-exact distribution always achieves the nominal significance level regardless of the number of studies and heterogeneity. We also propose some methods to construct a conservative interval estimation, even when the variance parameters are unknown, and present their performances via simulation and an application to Alzheimer’s disease meta-analysis.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41358458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}