J. C. W. Rayner, Paul Rippon, Thomas Suesse, Olivier Thas
{"title":"Smooth tests of goodness of fit for the distributional assumption of regression models","authors":"J. C. W. Rayner, Paul Rippon, Thomas Suesse, Olivier Thas","doi":"10.1111/anzs.12361","DOIUrl":"10.1111/anzs.12361","url":null,"abstract":"<div>\u0000 \u0000 <p>We focus on regression models that consist of (i) a model for the conditional mean of the outcome and (ii) a distributional assumption about the distribution of the outcome, both conditional on the regressors. Generalised linear models form a well-known example. The choice of the outcome distribution is often motivated by prior or background knowledge of the researcher, or it is simply chosen for convenience. We propose smooth goodness of fit tests for testing the distributional assumption in regression models. The tests arise from embedding the regression model in a smooth family of alternatives, and constructing appropriate score tests that correctly account for nuisance parameter estimation. The tests are customised, focussed and comprehensive. We present several examples to illustrate the wide applicability of our method. A small simulation study demonstrates that our tests have power to detect important deviations from the hypothesised model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79975652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modal clustering on PPGMMGA projection subspace","authors":"Luca Scrucca","doi":"10.1111/anzs.12360","DOIUrl":"10.1111/anzs.12360","url":null,"abstract":"<p>PPGMMGA is a projection pursuit (PP) algorithm aimed at detecting and visualising clustering structures in multivariate data. The algorithm uses the negentropy as PP index obtained by fitting Gaussian mixture models (GMMs) for density estimation and, then, exploits genetic algorithms (GAs) for its optimisation. Since the PPGMMGA algorithm is a dimension reduction technique specifically introduced for visualisation purposes, cluster memberships are not explicitly provided. In this paper a modal clustering approach is proposed for estimating clusters of projected data points. In particular, a modal EM algorithm is employed to estimate the modes corresponding to the local maxima in the projection subspace of the underlying density estimated using parsimonious GMMs. Data points are then clustered according to the domain of attraction of the identified modes. Simulated and real data are discussed to illustrate the proposed method and evaluate the clustering performance.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81884427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPS: An R package for modelling shifted families of distributions","authors":"Mahdi Teimouri, Saralees Nadarajah","doi":"10.1111/anzs.12359","DOIUrl":"10.1111/anzs.12359","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised statistical distributions have been widely used over the last decades for modelling phenomena in different fields. The generalisations have been made to produce distributions with more flexibility and lead to more accurate modelling in practice. Statistical analysis of the generalised distributions requires new statistical packages. The <span>Newdistns</span> package due to Nadarajah and Rocha provides <span>R</span> routines with functionality to compute probability density function (PDF), cumulative distribution function (CDF), quantile function, random numbers and parameter estimates of 19 families of distributions with applications in survival analysis. Here, we introduce an <span>R</span> package, called <span>MPS</span>, for computing PDF, CDF, quantile function, random numbers, Q–Q plots and parameter estimates for 24 shifted new families of distributions. By considering an extra location parameter, each family will be defined on the whole real line and so covers a broader range of applicability. We adopt the well-known maximum product spacing approach to estimate parameters of the families because under some situations the maximum likelihood (ML) estimators fail to exist. We demonstrate <span>MPS</span> by analysing two well-known real data sets. For the first data set, the ML estimators break down, but <span>MPS</span> works well. For the second set, adding a location parameter results in a reasonable model while the absence of the location parameter makes the model quite inappropriate. The <span>MPS</span> is available from CRAN at https://cran.r-project.org/package=MPS.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84200205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and efficient algorithms for sparse semiparametric bifunctional regression","authors":"Silvia Novo, Philippe Vieu, Germán Aneiros","doi":"10.1111/anzs.12355","DOIUrl":"10.1111/anzs.12355","url":null,"abstract":"<div>\u0000 \u0000 <p>A new sparse semiparametric model is proposed, which incorporates the influence of two functional random variables in a scalar response in a flexible and interpretable manner. One of the functional covariates is included through a single-index structure, while the other is included linearly through the high-dimensional vector formed by its discretised observations. For this model, two new algorithms are presented for selecting relevant variables in the linear part and estimating the model. Both procedures utilise the functional origin of linear covariates. Finite sample experiments demonstrated the scope of application of both algorithms: the first method is a fast algorithm that provides a solution (without loss in predictive ability) for the significant computational time required by standard variable selection methods for estimating this model, and the second algorithm completes the set of relevant linear covariates provided by the first, thus improving its predictive efficiency. Some asymptotic results theoretically support both procedures. A real data application demonstrated the applicability of the presented methodology from a predictive perspective in terms of the interpretability of outputs and low computational cost.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wagner Barreto-Souza, Vinícius D. Mayrink, Alexandre B. Simas
{"title":"Bessel regression and bbreg package to analyse bounded data","authors":"Wagner Barreto-Souza, Vinícius D. Mayrink, Alexandre B. Simas","doi":"10.1111/anzs.12354","DOIUrl":"10.1111/anzs.12354","url":null,"abstract":"<div>\u0000 \u0000 <p>Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data without a strong competitor having the same main features. A class of normalised inverse-Gaussian (N-IG) process was introduced in the literature and has been explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid to the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is an alternative to the beta model. The estimation of the parameters is done through an expectation–maximisation (EM) algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. A new <span>R</span> package called <span>bbreg</span> is developed for fitting both bessel and beta regression models based on the EM-algorithm and further providing graphical tools for model adequacy and model selection as well. Proper documentation for this package is available. The performances of the models are evaluated under misspecification in a simulation study. An empirical illustration is explored to confront results from bessel and beta regressions by using the new <span>R</span> package <span>bbreg</span>.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81092039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salvatore D. Tomarchio, Salvatore Ingrassia, Volodymyr Melnykov
{"title":"Modelling students’ career indicators via mixtures of parsimonious matrix-normal distributions","authors":"Salvatore D. Tomarchio, Salvatore Ingrassia, Volodymyr Melnykov","doi":"10.1111/anzs.12351","DOIUrl":"10.1111/anzs.12351","url":null,"abstract":"<div>\u0000 \u0000 <p>The evaluation of the teaching efficiency, under different points of view, is an important aspect for the university system because it helps managers to improve more and more the quality of the education and helps students to achieve strong professional skills. In this framework, students’ careers as well as teachers’ qualification and quantity adequacy indicators are analysed based on data sets provided by the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) according to a mixture model approach. In particular, parsimonious mixtures of matrix-normal distributions are used to detect underlying grouping structures. The results show that the data present an underlying group structure of courses having different traits, thus providing useful information for the university policy makers.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82270077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Greve, Bettina Grün, Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter
{"title":"Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis","authors":"Jan Greve, Bettina Grün, Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter","doi":"10.1111/anzs.12350","DOIUrl":"10.1111/anzs.12350","url":null,"abstract":"<p>Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ‘data clusters’) and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the <span>R</span> package <span>fipp</span>. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72830218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection and debiased estimation for single-index expectile model","authors":"Rong Jiang, Yexun Peng, Yufei Deng","doi":"10.1111/anzs.12348","DOIUrl":"10.1111/anzs.12348","url":null,"abstract":"<div>\u0000 \u0000 <p>This article develops a penalised asymmetric least squares estimator for single-index expectile model. The oracle property of the proposed estimator is established. Moreover, the debiasing technique is used to construct an estimator that is asymptotically normal, which enables the construction of valid confidence intervals and hypothesis testing. Simulation studies and one real data application are conducted to illustrate the finite sample performance of the proposed methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79758873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient estimation of partially linear tail index models using B-splines","authors":"Yaolan Ma, Bo Wei","doi":"10.1111/anzs.12357","DOIUrl":"10.1111/anzs.12357","url":null,"abstract":"<div>\u0000 \u0000 <p>The tail index is an important parameter in extreme value theory. In this paper, we consider a simple yet flexible spline estimation method for partially linear tail index models. We approximate the unknown function by B-splines and construct an approximate log-likelihood function to estimate the coefficients of the linear covariates and the B-spline basis functions. Consistency and asymptotic normality of the estimators are established. Subsequently, the proposed method is illustrated by using simulations and applications to the Fremantle annual maximum sea levels data and Chicago air pollution data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84190109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of the affine-invariant ensemble sampler's ‘stretch move’ in high dimensions","authors":"David Huijser, Jesse Goodman, Brendon J. Brewer","doi":"10.1111/anzs.12358","DOIUrl":"10.1111/anzs.12358","url":null,"abstract":"<div>\u0000 \u0000 <p>We present theoretical and practical properties of the affine-invariant ensemble sampler Markov Chain Monte Carlo method. In high dimensions, the sampler's ‘stretch move’ has unusual and undesirable properties. We demonstrate this with an <i>n</i>-dimensional correlated Gaussian toy problem with a known mean and covariance structure, and a multivariate version of the Rosenbrock problem. Visual inspection of a trace plots suggests the burn-in period is short. Upon closer inspection, we discover the mean and the variance of the target distribution do not match the known values, and the chain takes a very long time to converge. This problem becomes severe as <i>n</i> increases beyond 50. We also applied different diagnostics adapted to be applicable to ensemble methods to determine any lack of convergence. The diagnostics include the Gelman–Rubin method, the Heidelberger–Welch test, the integrated autocorrelation and the acceptance rate. The trace plot of individual walkers appears to be useful as well. We therefore conclude that the stretch move should be used with caution in moderate to high dimensions. We also present some heuristic results explaining this behaviour.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88726225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}