Suman Rakshit, Greg McSwiggan, Gopalan Nair, Adrian Baddeley
{"title":"Variable selection using penalised likelihoods for point patterns on a linear network","authors":"Suman Rakshit, Greg McSwiggan, Gopalan Nair, Adrian Baddeley","doi":"10.1111/anzs.12341","DOIUrl":"10.1111/anzs.12341","url":null,"abstract":"<div>\u0000 \u0000 <p>Motivated by the analysis of a comprehensive database of road traffic accidents, we investigate methods of variable selection for spatial point process models on a linear network. The original data may include explanatory spatial covariates, such as road curvature, and ‘mark’ variables attributed to individual accidents, such as accident severity. The treatment of mark variables is new. Variable selection is applied to the canonical covariates, which may include spatial covariate effects, mark effects and mark-covariate interactions. We approximate the likelihood of the point process model by that of a generalised linear model, in such a way that spatial covariates and marks are both associated with canonical covariates. We impose a convex penalty on the log likelihood, principally the elastic-net penalty, and maximise the penalised loglikelihood by cyclic coordinate ascent. A simulation study compares the performances of the lasso, ridge regression and elastic-net methods of variable selection on their ability to select variables correctly, and on their bias and standard error. Standard techniques for selecting the regularisation parameter <i>γ</i> often yielded unsatisfactory results. We propose two new rules for selecting <i>γ</i> which are designed to have better performance. The methods are tested on a small dataset on crimes in a Chicago neighbourhood, and applied to a large dataset of road traffic accidents in Western Australia.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90533201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECM algorithm for estimating vector ARMA model with variance gamma distribution and possible unbounded density","authors":"Thanakorn Nitithumbundit, Jennifer S.K. Chan","doi":"10.1111/anzs.12340","DOIUrl":"https://doi.org/10.1111/anzs.12340","url":null,"abstract":"<div>\u0000 \u0000 <p>The simultaneous analysis of several financial time series is salient in portfolio setting and risk management. This paper proposes a novel alternating expectation conditional maximisation (AECM) algorithm to estimate the vector autoregressive moving average (VARMA) model with variance gamma (VG) error distribution in the multivariate skewed setting. We explain why the VARMA-VG model is suitable for high-frequency returns (HFRs) because VG distribution provides thick tails to capture the high kurtosis in the data and unbounded central density further captures the majority of near-zero HFRs. The distribution can also be expressed in normal-mean-variance mixtures to facilitate model implementation using the Bayesian or expectation maximisation (EM) approach. We adopt the EM approach to avoid the time-consuming Markov chain Monto Carlo sampling and solve the unbounded density problem in the classical maximum likelihood estimation. We conduct extensive simulation studies to evaluate the accuracy of the proposed AECM estimator and apply the models to analyse the dependency between two HFR series from the time zones that only differ by one hour.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137538704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Inverse G-Wishart distribution and variational message passing","authors":"Luca Maestrini, Matt P. Wand","doi":"10.1111/anzs.12339","DOIUrl":"10.1111/anzs.12339","url":null,"abstract":"<div>\u0000 \u0000 <p>Message passing on a factor graph is a powerful paradigm for the coding of approximate inference algorithms for arbitrarily large graphical models. The notion of a factor graph fragment allows for compartmentalisation of algebra and computer code. We show that the Inverse G-Wishart family of distributions enables fundamental variational message passing factor graph fragments to be expressed elegantly and succinctly. Such fragments arise in models for which approximate inference concerning covariance matrix or variance parameters is made, and are ubiquitous in contemporary statistics and machine learning.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81925035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture-based clustering","authors":"Christian Hennig, Pietro Coretto","doi":"10.1111/anzs.12338","DOIUrl":"10.1111/anzs.12338","url":null,"abstract":"<p>We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto & Hennig, <i>Journal of the American Statistical Association</i> <b>111</b>, 1648–1659) of a Gaussian mixture model allowing for observations to be classified as ‘noise’, but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic <i>Q</i> that measures how close the within-cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This non-parametric measure allows for non-Gaussian clusters as long as they have a good quality according to <i>Q</i>. The simplicity of a model is assessed by a measure <i>S</i> that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of <i>Q</i> is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model-based clustering using the Bayesian information criterion (BIC) and the integrated complete likelihood (ICL) in a simulation study and on two real data sets.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12338","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75692546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What is the effective sample size of a spatial point process?","authors":"Ian W. Renner, David I. Warton, Francis K.C. Hui","doi":"10.1111/anzs.12337","DOIUrl":"10.1111/anzs.12337","url":null,"abstract":"<div>\u0000 \u0000 <p>Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of sample size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the sample size, <i>n</i>, and are derived in an asymptotic framework where <i>n</i> tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, <i>m</i>, plays the role that sample size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81154600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anna Karenina and the two envelopes problem","authors":"R. D. Gill","doi":"10.1111/anzs.12329","DOIUrl":"10.1111/anzs.12329","url":null,"abstract":"<div>\u0000 \u0000 <p>The Anna Karenina principle is named after the opening sentence in the eponymous novel: Happy families are all alike; every unhappy family is unhappy in its own way. The two envelopes problem (TEP) is a much-studied paradox in probability theory, mathematical economics, logic and philosophy. Time and again a new analysis is published in which an author claims finally to explain what actually goes wrong in this paradox. Each author (the present author included) emphasises what is new in their approach and concludes that earlier approaches did not get to the root of the matter. We observe that though a logical argument is only correct if every step is correct, an apparently logical argument which goes astray can be thought of as going astray at different places. This leads to a comparison between the literature on TEP and a successful movie franchise: it generates a succession of sequels, and even prequels, each with a different director who approaches the same basic premise in a personal way. We survey resolutions in the literature with a view to synthesis, correct common errors, and give a new theorem on order properties of an exchangeable pair of random variables, at the heart of most TEP variants and interpretations. A theorem on asymptotic independence between the amount in your envelope and the question whether it is smaller or larger shows that the pathological situation of improper priors or infinite expectation values has consequences as we merely approach such a situation.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80260052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Festschrift for Adrian Baddeley","authors":"Martin L. Hazelton, R. Turner","doi":"10.1111/anzs.12322","DOIUrl":"10.1111/anzs.12322","url":null,"abstract":"<div>\u0000 \u0000 <p>This article introduces a special issue of the Australian and New Zealand Journal of Statistics, being a Festschrift for Adrian Baddeley on the occasion of his 65th birthday.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73814182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dependent radius marks of Laguerre tessellations: a case study","authors":"Dietrich Stoyan, Viktor Beneš, Filip Seitl","doi":"10.1111/anzs.12314","DOIUrl":"10.1111/anzs.12314","url":null,"abstract":"<div>\u0000 \u0000 <p>We study a particular marked three-dimensional point process sample that represents a Laguerre tessellation. It comes from a polycrystalline sample of aluminium alloy material. The ‘points’ are the cell generators while the ‘marks’ are radius marks that control the size and shape of the tessellation cells. Our statistical mark correlation analyses show that the marks of the sample are in clear and plausible spatial correlation: the marks of generators close together tend to be small and similar and the form of the correlation functions does not justify geostatistical marking. We show that a simplified modelling of tessellations by Laguerre tessellations with independent radius marks may lead to wrong results. When we started from the aluminium alloy data and generated random marks by random permutation we obtained tessellations with characteristics quite different from the original ones. We observed similar behaviour for simulated Laguerre tessellations. This fact, which seems to be natural for the given data type, makes fitting of models to empirical Laguerre tessellations quite difficult: the generator points and radius marks have to be modelled simultaneously. This may imply that the reconstruction methods are more efficient than point-process modelling if only samples of similar Laguerre tessellations are needed. We also found that literature recipes for bandwidth choice for estimating correlation functions should be used with care.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12314","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80348739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. P. Verbyla, J. De Faveri, D. M. Deery, G. J. Rebetzke
{"title":"Modelling temporal genetic and spatio-temporal residual effects for high-throughput phenotyping data*","authors":"A. P. Verbyla, J. De Faveri, D. M. Deery, G. J. Rebetzke","doi":"10.1111/anzs.12336","DOIUrl":"10.1111/anzs.12336","url":null,"abstract":"<div>\u0000 \u0000 <p>High-throughput phenomics data are being collected in both the laboratory and the field. The data are often collected at many time points and there may be spatial variation in the laboratory or field that impacts on the growth of the plants, and that may influence the traits of interest. Modelling the genetic effects is of primary interest in such studies, but these effects might be biased if non-genetic effects present in the experiment are ignored. With data that are collected both in time and space, there may be a need to jointly model these multi-dimensional non-genetic effects. Thus both modelling of genetic effects over time and non-genetic effects over time and space in a one-stage analysis is considered. An experiment that involves field phenomics data with four dimensions, two in space and two in time, provides the vehicle to examine the models. Factor analytic (FA) models are often used for genetic effects for different environments to provide reliable estimates of genetic variances and correlations. As the time dimension defines the environments, FA models are examined for the phenomics data. Reduced rank tensor smoothing splines are presented as a possible approach for modelling the spatio-temporal effects, although an additional term is included for heterogeneity over the two time dimensions. This approach is feasible, although very time-consuming. The process of model selection for the genetic effects is presented including tests, information criteria and diagnostics. Comparisons of more simplistic models are made with the reduced rank tensor spline. This also shows the interplay between the genetic and residual models in model selection.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89156281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional intensity: A powerful tool for modelling and analysing point process data","authors":"Peter J. Diggle","doi":"10.1111/anzs.12331","DOIUrl":"10.1111/anzs.12331","url":null,"abstract":"<div>\u0000 \u0000 <p>The conditional intensity function of a spatial point process describes how the probability that a point of the process occurs ‘at’ a particular point in its carrier space depends on the realisation of the process in the remainder of the carrier space. Provided that the point process is simple, the conditional intensity determines all of the properties of the process, in particular its likelihood function. In this paper, we review the use of the conditional intensity function in the formulation of point process models and in making inferences from point process data, giving separate consideration to temporal, spatial and spatiotemporal settings. We argue that the conditional intensity function should take centre-stage in spatiotemporal point process modelling and analysis.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/anzs.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86952633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}