Qin Wu, Guo-Liang Tian, Tao Li, Man-Lai Tang, Chi Zhang
{"title":"The multivariate component zero-inflated Poisson model for correlated count data analysis","authors":"Qin Wu, Guo-Liang Tian, Tao Li, Man-Lai Tang, Chi Zhang","doi":"10.1111/anzs.12395","DOIUrl":"https://doi.org/10.1111/anzs.12395","url":null,"abstract":"<div>\u0000 \u0000 <p>Multivariate zero-inflated Poisson (ZIP) distributions are important tools for modelling and analysing correlated count data with extra zeros. Unfortunately, existing multivariate ZIP distributions consider only the overall zero-inflation while the component zero-inflation is not well addressed. This paper proposes a flexible multivariate ZIP distribution, called the multivariate component ZIP distribution, in which both the overall and component zero-inflations are taken into account. Likelihood-based inference procedures including the calculation of maximum likelihood estimates of parameters in the model without and with covariates are provided. Simulation studies indicate that the performance of the proposed methods on the multivariate component ZIP model is satisfactory. The Australia health care utilisation data set is analysed to demonstrate that the new distribution is more appropriate than the existing multivariate ZIP distributions.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"234-261"},"PeriodicalIF":1.1,"publicationDate":"2023-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Short-term forecasting with a computationally efficient nonparametric transfer function model","authors":"Jun. M. Liu","doi":"10.1111/anzs.12394","DOIUrl":"https://doi.org/10.1111/anzs.12394","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper a semi-parametric approach is developed to model non-linear relationships in time series data using polynomial splines. Polynomial splines require very little assumption about the functional form of the underlying relationship, so they are very flexible and can be used to model highly non-linear relationships. Polynomial splines are also computationally very efficient. The serial correlation in the data is accounted for by modelling the noise as an autoregressive integrated moving average (ARIMA) process, by doing so, the efficiency in nonparametric estimation is improved and correct inferences can be obtained. The explicit structure of the ARIMA model allows the correlation information to be used to improve forecasting performance. An algorithm is developed to automatically select and estimate the polynomial spline model and the ARIMA model through backfitting. This method is applied on a real-life data set to forecast hourly electricity usage. The non-linear effect of temperature on hourly electricity usage is allowed to be different at different hours of the day and days of the week. The forecasting performance of the developed method is evaluated in post-sample forecasting and compared with several well-accepted models. The results show the performance of the proposed model is comparable with a long short-term memory deep learning model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"187-212"},"PeriodicalIF":1.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50114984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotics of M-estimator in multivariate linear regression models for a class of random errors","authors":"Yi Wu, Wei Yu, Xuejun Wang","doi":"10.1111/anzs.12393","DOIUrl":"https://doi.org/10.1111/anzs.12393","url":null,"abstract":"<div>\u0000 \u0000 <p>It is known that linear regression models have immense applications in various areas such as engineering technology, economics and social sciences. In this paper, we investigate the asymptotic properties of <i>M</i>-estimator in multivariate linear regression model based on a class of random errors satisfying a generalised Bernstein-type inequality. By using the generalised Bernstein-type inequality, we obtain a general result on almost sure convergence for a class of random variables and then obtain the strong consistency for the <i>M</i>-estimator in multivariate linear regression models under some mild conditions. The result extends or improves some existing ones in the literature. Moreover, we also consider the case when the dimension $p$ tends to infinity by establishing the rate of almost sure convergence for a class of random variables satisfying generalised Bernstein-type inequality. Some numerical simulations are also provided to verify the validity of the theoretical results.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 3","pages":"262-285"},"PeriodicalIF":1.1,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50148711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyao Li, Christopher M. Triggs, Ciprian Doru Giurcăneanu
{"title":"On the selection of predictors by using greedy algorithms and information theoretic criteria","authors":"Fangyao Li, Christopher M. Triggs, Ciprian Doru Giurcăneanu","doi":"10.1111/anzs.12387","DOIUrl":"https://doi.org/10.1111/anzs.12387","url":null,"abstract":"<p>We discuss the use of the following greedy algorithms in the prediction of multivariate time series: Matching Pursuit Algorithm (MPA), Orthogonal Matching Pursuit (OMP), Relaxed Matching Pursuit (RMP), Frank–Wolfe Algorithm (FWA) and Constrained Matching Pursuit (CMP). The last two are known to be solvers for the lasso problem. Some of the algorithms are well-known (e.g. OMP), while others are less popular (e.g. RMP). We provide a unified presentation of all the algorithms, and evaluate their computational complexity for the high-dimensional case and for the big data case. We show how 12 information theoretic (IT) criteria can be used jointly with the greedy algorithms. As part of this effort, we derive new theoretical results that allow modification of the IT criteria such that to be compatible with RMP. The prediction capabilities are tested in experiments with two data sets. The first one involves air pollution data measured in Auckland (New Zealand) and the second one concerns the House Price Index in England (the United Kingdom).</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"77-100"},"PeriodicalIF":1.1,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50155532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikola Počuča, Michael P.B. Gallaugher, Katharine M. Clark, Paul D. McNicholas
{"title":"Visual assessment of matrix-variate normality","authors":"Nikola Počuča, Michael P.B. Gallaugher, Katharine M. Clark, Paul D. McNicholas","doi":"10.1111/anzs.12388","DOIUrl":"https://doi.org/10.1111/anzs.12388","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, the analysis of three-way data has become ever more prevalent in the literature. It is becoming increasingly common to analyse such data by means of matrix-variate distributions, the most prevalent of which is the matrix-variate normal distribution. Although many methods exist for assessing multivariate normality, there is a relative paucity of approaches for assessing matrix-variate normality. Herein, a new visual method is proposed for assessing matrix-variate normality by means of a distance–distance plot. In addition, a testing procedure is discussed to be used in tandem with the proposed visual method. The proposed approach is illustrated via simulated data as well as an application on analysing handwritten digits.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"152-165"},"PeriodicalIF":1.1,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50151748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust PCA for high-dimensional data based on characteristic transformation","authors":"Lingyu He, Yanrong Yang, Bo Zhang","doi":"10.1111/anzs.12385","DOIUrl":"https://doi.org/10.1111/anzs.12385","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"127-151"},"PeriodicalIF":1.1,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian neural tree models for nonparametric regression","authors":"Tanujit Chakraborty, Gauri Kamat, Ashis Kumar Chakraborty","doi":"10.1111/anzs.12386","DOIUrl":"https://doi.org/10.1111/anzs.12386","url":null,"abstract":"<div>\u0000 \u0000 <p>Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 2","pages":"101-126"},"PeriodicalIF":1.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50139340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A nonparametric mixture approach to density and null proportion estimation in large-scale multiple comparison problems","authors":"Xiangjie Xue, Yong Wang","doi":"10.1111/anzs.12383","DOIUrl":"https://doi.org/10.1111/anzs.12383","url":null,"abstract":"<p>A new method for estimating the proportion of null effects is proposed for solving large-scale multiple comparison problems. It utilises maximum likelihood estimation of nonparametric mixtures, which also provides a density estimate of the test statistics. It overcomes the problem of the usual nonparametric maximum likelihood estimator that cannot produce a positive probability at the location of null effects in the process of estimating nonparametrically a mixing distribution. The profile likelihood is further used to help produce a range of null proportion values, corresponding to which the density estimates are all consistent. With a proper choice of a threshold function on the profile likelihood ratio, the upper endpoint of this range can be shown to be a consistent estimator of the null proportion. Numerical studies show that the proposed method has an apparently convergent trend in all cases studied and performs favourably when compared with existing methods in the literature.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 1","pages":"49-75"},"PeriodicalIF":1.1,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50119875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A method to reduce the width of confidence intervals by using a normal scores transformation","authors":"T. W. O’Gorman","doi":"10.1111/anzs.12384","DOIUrl":"https://doi.org/10.1111/anzs.12384","url":null,"abstract":"<div>\u0000 \u0000 <p>In stating the results of their research, scientists usually want to publish narrow confidence intervals because they give precise estimates of the effects of interest. In many cases, the researcher would want to use the narrowest interval that maintains the desired coverage probability. In this manuscript, we propose a new method of finding confidence intervals that are often narrower than traditional confidence intervals for any individual parameter in a linear model if the errors are from a skewed distribution or from a long-tailed symmetric distribution. If the errors are normally distributed, we show that the width of the proposed normal scores confidence interval will not be much greater than the width of the traditional interval. If the dataset includes predictor variables that are uncorrelated or moderately correlated then the confidence intervals will maintain their coverage probability. However, if the covariates are highly correlated, then the coverage probability of the proposed confidence interval may be slightly lower than the nominal value. The procedure is not computationally intensive and an R program is available to determine the normal scores 95% confidence interval. Whenever the covariates are not highly correlated, the normal scores confidence interval is recommended for the analysis of datasets having 50 or more observations.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 1","pages":"35-48"},"PeriodicalIF":1.1,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50136144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection in heterogeneous panel data models with cross-sectional dependence","authors":"Xiaoling Mei, Bin Peng, Huanjun Zhu","doi":"10.1111/anzs.12381","DOIUrl":"https://doi.org/10.1111/anzs.12381","url":null,"abstract":"<p>This paper studies the Bridge estimator for a high-dimensional panel data model with heterogeneous varying coefficients, where the random errors are assumed to be serially correlated and cross-sectionally dependent. We establish oracle efficiency and the asymptotic distribution of the Bridge estimator, when the number of covariates increases to infinity with the sample size in both dimensions. A BIC-type criterion is also provided for tuning parameter selection. We further generalise the marginal Bridge estimator for our model to asymptotically correctly identify the covariates with zero coefficients even when the number of covariates is greater than the sample size under a partial orthogonality condition. The finite sample performance of the proposed estimator is demonstrated by simulated data examples, and an empirical application with the US stock dataset is also provided.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"65 1","pages":"14-34"},"PeriodicalIF":1.1,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}