{"title":"Variable selection and debiased estimation for single-index expectile model","authors":"Rong Jiang, Yexun Peng, Yufei Deng","doi":"10.1111/anzs.12348","DOIUrl":"10.1111/anzs.12348","url":null,"abstract":"<div>\u0000 \u0000 <p>This article develops a penalised asymmetric least squares estimator for single-index expectile model. The oracle property of the proposed estimator is established. Moreover, the debiasing technique is used to construct an estimator that is asymptotically normal, which enables the construction of valid confidence intervals and hypothesis testing. Simulation studies and one real data application are conducted to illustrate the finite sample performance of the proposed methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"658-673"},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79758873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient estimation of partially linear tail index models using B-splines","authors":"Yaolan Ma, Bo Wei","doi":"10.1111/anzs.12357","DOIUrl":"10.1111/anzs.12357","url":null,"abstract":"<div>\u0000 \u0000 <p>The tail index is an important parameter in extreme value theory. In this paper, we consider a simple yet flexible spline estimation method for partially linear tail index models. We approximate the unknown function by B-splines and construct an approximate log-likelihood function to estimate the coefficients of the linear covariates and the B-spline basis functions. Consistency and asymptotic normality of the estimators are established. Subsequently, the proposed method is illustrated by using simulations and applications to the Fremantle annual maximum sea levels data and Chicago air pollution data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"27-44"},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84190109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of the affine-invariant ensemble sampler's ‘stretch move’ in high dimensions","authors":"David Huijser, Jesse Goodman, Brendon J. Brewer","doi":"10.1111/anzs.12358","DOIUrl":"10.1111/anzs.12358","url":null,"abstract":"<div>\u0000 \u0000 <p>We present theoretical and practical properties of the affine-invariant ensemble sampler Markov Chain Monte Carlo method. In high dimensions, the sampler's ‘stretch move’ has unusual and undesirable properties. We demonstrate this with an <i>n</i>-dimensional correlated Gaussian toy problem with a known mean and covariance structure, and a multivariate version of the Rosenbrock problem. Visual inspection of a trace plots suggests the burn-in period is short. Upon closer inspection, we discover the mean and the variance of the target distribution do not match the known values, and the chain takes a very long time to converge. This problem becomes severe as <i>n</i> increases beyond 50. We also applied different diagnostics adapted to be applicable to ensemble methods to determine any lack of convergence. The diagnostics include the Gelman–Rubin method, the Heidelberger–Welch test, the integrated autocorrelation and the acceptance rate. The trace plot of individual walkers appears to be useful as well. We therefore conclude that the stretch move should be used with caution in moderate to high dimensions. We also present some heuristic results explaining this behaviour.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 1","pages":"1-26"},"PeriodicalIF":1.1,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88726225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global implicit function theorems and the online expectation–maximisation algorithm","authors":"Hien Duy Nguyen, Florence Forbes","doi":"10.1111/anzs.12356","DOIUrl":"10.1111/anzs.12356","url":null,"abstract":"The expectation–maximisation (EM) algorithm framework is an important tool for statistical computation. Due to the changing nature of data, online and mini‐batch variants of EM and EM‐like algorithms have become increasingly popular. The consistency of the estimator sequences that are produced by these EM variants often rely on an assumption regarding the continuous differentiability of a parameter update function. In many cases, the parameter update function is not in closed form and may only be defined implicitly, which makes the verification of the continuous differentiability property difficult. We demonstrate how a global implicit function theorem can be used to verify such properties in the cases of finite mixtures of distributions in the exponential family, and more generally, when the component‐specific distributions admit data augmentation schemes, within the exponential family. We then illustrate the use of such a theorem in the cases of mixtures of beta distributions, gamma distributions, fully visible Boltzmann machines and Student distributions. Via numerical simulations, we provide empirical evidence towards the consistency of the online EM algorithm parameter estimates in such cases.","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"255-281"},"PeriodicalIF":1.1,"publicationDate":"2022-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83582209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sufficient dimension reduction for clustered data via finite mixture modelling","authors":"F.K.C. Hui, L.H. Nghiem","doi":"10.1111/anzs.12349","DOIUrl":"10.1111/anzs.12349","url":null,"abstract":"<div>\u0000 \u0000 <p>Sufficient dimension reduction (SDR) is an attractive approach to regression modelling. However, despite its rich literature and growing popularity in application, surprisingly little research has been done on how to perform SDR for clustered data, for example as is commonly arises in longitudinal studies. Indeed, current popular SDR methods have been mostly based on a marginal estimating equation approach. In this article, we propose a new approach to SDR for clustered data based on a combination of finite mixture modelling and mixed effects regression. Finite mixture models offer a flexible means of estimating the fixed effects central subspace, based on slicing the space up and probabilistically clustering observations to each slice (mixture component). Dimension reduction is achieved by having the mixing proportions vary only through the sufficient fixed effect predictors. We then incorporate random effects as a natural means of accounting for correlations within clusters. We employ a Monte Carlo expectation–maximisation algorithm to estimate the model parameters and fixed effects central subspace, and discuss methods for associated uncertainty quantification and prediction. Simulation studies demonstrate that our approach performs strongly against both estimating equation methods for estimating the fixed effects central subspace, and SDR methods which do not account for within-cluster correlation. Finally, we apply the proposed approach to a data set on air pollutant monitoring across 13 stations in the Eastern United States.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"133-157"},"PeriodicalIF":1.1,"publicationDate":"2022-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73971724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Pirikahu, Geoffrey Jones, Martin L. Hazelton
{"title":"Bayesian credible intervals for population attributable risk from case–control, cohort and cross-sectional studies","authors":"Sarah Pirikahu, Geoffrey Jones, Martin L. Hazelton","doi":"10.1111/anzs.12352","DOIUrl":"10.1111/anzs.12352","url":null,"abstract":"<div>\u0000 \u0000 <p>Population attributable risk (PAR) and population attributable fraction (PAF) are used in epidemiology to predict the impact of removing a risk factor from the population. Until recently, no standard approach for calculating confidence intervals or the variance for PAR in particular was available in the literature. Previously we outlined a fully Bayesian approach to provide credible intervals for the PAR and PAF from a cross-sectional study, where the data was presented in the form of a 2×2 table. However, extensions to cater for other frequently used study designs were not provided. In this paper we provide methodology to calculate credible intervals for the PAR and PAF for case–control and cohort studies. Additionally, we extend the cross-sectional example to allow for the incorporation of uncertainty that arises when an imperfect diagnostic test is used. In all these situations the model becomes over-parameterised, or non-identifiable, which can result in standard ‘off-the-shelf’ Markov Chain Monte Carlo (MCMC) updaters taking a long time to converge or even failing altogether. We adapt an importance sampling methodology to overcome this problem, and propose some novel MCMC samplers that take into consideration the shape of the posterior ridge to aid in the convergence of the Markov chain.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"639-657"},"PeriodicalIF":1.1,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79829223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring the values of cricket players","authors":"Pranjal Chandrakar, Shubhabrata Das","doi":"10.1111/anzs.12353","DOIUrl":"10.1111/anzs.12353","url":null,"abstract":"<div>\u0000 \u0000 <p>Sports franchises that participate in team sports can make better decisions regarding their players’ financial compensation, renewal of the contracts, bidding strategies during the auction, etc., if they can adequately assess the value or worth of their players. Evaluating the value of a player in a team sport is difficult because various team members play different roles. In this study, we resolve this by measuring the value of a player in terms of how his inclusion in the team affects the team's probability of winning. With this notion of value, we develop a technique to measure the worth of a cricket player for his franchise. To illustrate this technique, we evaluate the values of cricket players who play in the Indian Premier League. We also study the relationship between players’ values and their salaries. We find that a few popular players earn disproportionately more than others. This disproportionality in the income of popular players cannot be justified by their performance alone, as adjudged by their values in this work. We attribute the disproportionality in the income to the factors not captured via conventional yardsticks, including leadership or brand value.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"565-578"},"PeriodicalIF":1.1,"publicationDate":"2022-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74128017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection boundary for a sparse gamma scale mixture model","authors":"Michael I. Stewart","doi":"10.1111/anzs.12347","DOIUrl":"10.1111/anzs.12347","url":null,"abstract":"<div>\u0000 \u0000 <p>We derive the detection boundary for the one-sided version of the gamma scale mixture model where the contaminating component has a larger mean than the known reference distribution. We also derive an adaptive test which is able to almost uniformly attain the best possible performance in terms of detection of local alternatives.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"282-296"},"PeriodicalIF":1.1,"publicationDate":"2022-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77949170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Odds-symmetry model for cumulative probabilities and decomposition of a conditional symmetry model in square contingency tables","authors":"Shuji Ando","doi":"10.1111/anzs.12346","DOIUrl":"10.1111/anzs.12346","url":null,"abstract":"<div>\u0000 \u0000 <p>For the analysis of square contingency tables, it is necessary to estimate an unknown distribution with high confidence from an obtained observation. For that purpose, we need to introduce a statistical model that fits the data well and has parsimony. This study proposes asymmetry models based on cumulative probabilities for square contingency tables with the same row and column ordinal classifications. In the proposed models, the odds, for all <i>i</i><<i>j</i>, that an observation will fall in row category <i>i</i> or below, and column category <i>j</i> or above, instead of row category <i>j</i> or above, and column category <i>i</i> or below, depend on only row category <i>i</i> or column category <i>j</i>. This is notwithstanding that the odds are constant without relying on row and column categories under the conditional symmetry (CS) model. The proposed models constantly hold when the CS model holds. However, the converse is not necessarily true. This study also shows that it is necessary to satisfy the extended marginal homogeneity model, in addition to the proposed models, to satisfy the CS model. These decomposition theorems explain why the CS model does not hold. The proposed models provide a better fit for application to a single data set of real-world occupational data for father-and-son dyads.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"674-684"},"PeriodicalIF":1.1,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77244237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengyi Liu, Guo-Liang Tian, Kam Chuen Yuen, Chi Zhang, Man-Lai Tang
{"title":"Proportional inverse Gaussian distribution: A new tool for analysing continuous proportional data","authors":"Pengyi Liu, Guo-Liang Tian, Kam Chuen Yuen, Chi Zhang, Man-Lai Tang","doi":"10.1111/anzs.12345","DOIUrl":"10.1111/anzs.12345","url":null,"abstract":"<div>\u0000 \u0000 <p>Outcomes in the form of rates, fractions, proportions and percentages often appear in various fields. Existing beta and simplex distributions are frequently unable to exhibit satisfactory performances in fitting such continuous data. This paper aims to develop the normalised inverse Gaussian (N-IG) distribution proposed by Lijoi, Mena & Prünster (2005, Journal of the American Statistical Association, <b>100</b>, 1278–1291) as a new tool for analysing continuous proportional data in (0,1) and renames the N-IG as proportional inverse Gaussian (PIG) distribution. Our main contributions include: (i) To overcome the difficulty of an integral in the PIG density function, we propose a novel minorisation–maximisation (MM) algorithm via the continuous version of Jensen's inequality to calculate the maximum likelihood estimates of the parameters in the PIG distribution; (ii) We also develop an MM algorithm aided by the gradient descent algorithm for the PIG regression model, which allows us to explore the relationship between a set of covariates with the mean parameter; (iii) Both the comparative studies and the real data analyses show that the PIG distribution is better when comparing with the beta and simplex distributions in terms of the AIC, the Cramér–von Mises and the Kolmogorov–Smirnov tests. In addition, bootstrap confidence intervals and testing hypothesis on the symmetry of the PIG density are also presented. Simulation studies are conducted and the hospital stay data of Barcelona in 1988 and 1990 are analysed to illustrate the proposed methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"63 4","pages":"579-605"},"PeriodicalIF":1.1,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87974708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}