{"title":"Robust PCA for high-dimensional data based on characteristic transformation","authors":"Lingyu He, Yanrong Yang, Bo Zhang","doi":"10.1111/anzs.12385","DOIUrl":"https://doi.org/10.1111/anzs.12385","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we propose a novel robust principal component analysis (PCA) for high-dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy-tail-distributed data, whose covariances may be non-existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of simulations, we demonstrate the benefits of our approach over the classical PCA. Finally, using data on protein expression in mice of various genotypes in a biological study, we apply the novel robust PCA to categorise the mice and find that our approach is more effective at identifying abnormal mice than the classical PCA.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian neural tree models for nonparametric regression","authors":"Tanujit Chakraborty, Gauri Kamat, Ashis Kumar Chakraborty","doi":"10.1111/anzs.12386","DOIUrl":"https://doi.org/10.1111/anzs.12386","url":null,"abstract":"<div>\u0000 \u0000 <p>Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical consistency of the proposed approaches and derive the optimal value of a vital model parameter. The excellent performance of the newly proposed BNT models is shown using simulation studies. We also provide some illustrative examples using a wide variety of standard regression datasets from a public available machine learning repository to show the superiority of the proposed models in comparison to popularly used Bayesian CART and Bayesian neural network models.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50139340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A nonparametric mixture approach to density and null proportion estimation in large-scale multiple comparison problems","authors":"Xiangjie Xue, Yong Wang","doi":"10.1111/anzs.12383","DOIUrl":"https://doi.org/10.1111/anzs.12383","url":null,"abstract":"<p>A new method for estimating the proportion of null effects is proposed for solving large-scale multiple comparison problems. It utilises maximum likelihood estimation of nonparametric mixtures, which also provides a density estimate of the test statistics. It overcomes the problem of the usual nonparametric maximum likelihood estimator that cannot produce a positive probability at the location of null effects in the process of estimating nonparametrically a mixing distribution. The profile likelihood is further used to help produce a range of null proportion values, corresponding to which the density estimates are all consistent. With a proper choice of a threshold function on the profile likelihood ratio, the upper endpoint of this range can be shown to be a consistent estimator of the null proportion. Numerical studies show that the proposed method has an apparently convergent trend in all cases studied and performs favourably when compared with existing methods in the literature.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50119875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A method to reduce the width of confidence intervals by using a normal scores transformation","authors":"T. W. O’Gorman","doi":"10.1111/anzs.12384","DOIUrl":"https://doi.org/10.1111/anzs.12384","url":null,"abstract":"<div>\u0000 \u0000 <p>In stating the results of their research, scientists usually want to publish narrow confidence intervals because they give precise estimates of the effects of interest. In many cases, the researcher would want to use the narrowest interval that maintains the desired coverage probability. In this manuscript, we propose a new method of finding confidence intervals that are often narrower than traditional confidence intervals for any individual parameter in a linear model if the errors are from a skewed distribution or from a long-tailed symmetric distribution. If the errors are normally distributed, we show that the width of the proposed normal scores confidence interval will not be much greater than the width of the traditional interval. If the dataset includes predictor variables that are uncorrelated or moderately correlated then the confidence intervals will maintain their coverage probability. However, if the covariates are highly correlated, then the coverage probability of the proposed confidence interval may be slightly lower than the nominal value. The procedure is not computationally intensive and an R program is available to determine the normal scores 95% confidence interval. Whenever the covariates are not highly correlated, the normal scores confidence interval is recommended for the analysis of datasets having 50 or more observations.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50136144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable selection in heterogeneous panel data models with cross-sectional dependence","authors":"Xiaoling Mei, Bin Peng, Huanjun Zhu","doi":"10.1111/anzs.12381","DOIUrl":"https://doi.org/10.1111/anzs.12381","url":null,"abstract":"<p>This paper studies the Bridge estimator for a high-dimensional panel data model with heterogeneous varying coefficients, where the random errors are assumed to be serially correlated and cross-sectionally dependent. We establish oracle efficiency and the asymptotic distribution of the Bridge estimator, when the number of covariates increases to infinity with the sample size in both dimensions. A BIC-type criterion is also provided for tuning parameter selection. We further generalise the marginal Bridge estimator for our model to asymptotically correctly identify the covariates with zero coefficients even when the number of covariates is greater than the sample size under a partial orthogonality condition. The finite sample performance of the proposed estimator is demonstrated by simulated data examples, and an empirical application with the US stock dataset is also provided.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On two conjectures about perturbations of the stochastic growth rate","authors":"Stefano Giaimo","doi":"10.1111/anzs.12382","DOIUrl":"https://doi.org/10.1111/anzs.12382","url":null,"abstract":"<p>The stochastic growth rate describes long-run growth of a population that lives in a fluctuating environment. Perturbation analysis of the stochastic growth rate provides crucial information for population managers, ecologists and evolutionary biologists. This analysis quantifies the response of the stochastic growth rate to changes in demographic parameters. A form of this analysis deals with changes that only occur in some environmental states. Caswell put forth two conjectures about environment-specific perturbations of the stochastic growth rate. The conjectures link the stationary distribution of the stochastic environmental process with the magnitude of some environment-specific perturbations. This note disproves one conjecture and proves the other.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50150997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Richards growth model to predict fruit weight","authors":"Daniel Gerhard, Elena Moltchanova","doi":"10.1111/anzs.12380","DOIUrl":"10.1111/anzs.12380","url":null,"abstract":"<p>The Richards model comprises several popular sigmoidal and monomolecular growth curves. We illustrate fitting of a Bayesian Richards model by splitting the full growth model into several submodels, followed by a model selection procedure. The performance of the methodology is evaluated by Monte Carlo simulations. A double-sigmoidal version of the Richards model is applied to model grape bunch weight based on data from a New Zealand vineyard over a single growing period.</p><p>A Bayesian Richards growth model applied to grape size data. Representations of phenological processes are selected through multi-model inference.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77550644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum cost-compression risk in principal component analysis","authors":"Bhargab Chattopadhyay, Swarnali Banerjee","doi":"10.1111/anzs.12378","DOIUrl":"10.1111/anzs.12378","url":null,"abstract":"<div>\u0000 \u0000 <p>Principal Component Analysis (PCA) is a popular multivariate analytic tool which can be used for dimension reduction without losing much information. Data vectors containing a large number of features arriving sequentially may be correlated with each other. An effective algorithm for such situations is online PCA. Existing Online PCA research works revolve around proposing efficient scalable updating algorithms focusing on compression loss only. They do not take into account the size of the dataset at which further arrival of data vectors can be terminated and dimension reduction can be applied. It is well known that the dataset size contributes to reducing the compression loss – the smaller the dataset size, the larger the compression loss while larger the dataset size, the lesser the compression loss. However, the reduction in compression loss by increasing dataset size will increase the total data collection cost. In this paper, we move beyond the scalability and updation problems related to Online PCA and focus on optimising a cost-compression loss which considers the compression loss and data collection cost. We minimise the corresponding risk using a two-stage PCA algorithm. The resulting two-stage algorithm is a fast and an efficient alternative to Online PCA and is shown to exhibit attractive convergence properties with no assumption on specific data distributions. Experimental studies demonstrate similar results and further illustrations are provided using real data. As an extension, a multi-stage PCA algorithm is discussed as well. Given the time complexity, the two-stage PCA algorithm is emphasised over the multi-stage PCA algorithm for online data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82020722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new minification integer-valued autoregressive process driven by explanatory variables","authors":"Lianyong Qian, Fukang Zhu","doi":"10.1111/anzs.12379","DOIUrl":"10.1111/anzs.12379","url":null,"abstract":"<div>\u0000 \u0000 <p>The discrete minification model based on the modified negative binomial operator, as an extension to the continuous minification model, can be used to describe an extreme value after few increasing values. To make this model more practical and flexible, a new minification integer-valued autoregressive process driven by explanatory variables is proposed. Ergodicity of the new process is discussed. The estimators of the unknown parameters are obtained via the conditional least squares and conditional maximum likelihood methods, and the asymptotic properties are also established. A testing procedure for checking existence of the explanatory variables is developed. Some Monte Carlo simulations are given to illustrate the finite-sample performances of the estimators under specification and misspecification and the test, respectively. A real example is applied to illustrate the performance of our model.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82225959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small area estimation under a semi-parametric covariate measured with error","authors":"Reyhane Sefidkar, Mahmoud Torabi, Amir Kavousi","doi":"10.1111/anzs.12377","DOIUrl":"10.1111/anzs.12377","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, small area estimation has played an important role in statistics as it deals with the problem of obtaining reliable estimates for parameters of interest in areas with small or even zero sample sizes corresponding to population sizes. Nested error linear regression models are often used in small area estimation assuming that the covariates are measured without error and also the relationship between covariates and response variable is linear. Small area models have also been extended to the case in which a linear relationship may not hold, using penalised spline (P-spline) regression, but assuming that the covariates are measured without error. Recently, a nested error regression model using a P-spline regression model, for the fixed part of the model, has been studied assuming the presence of measurement error in covariate, in the Bayesian framework. In this paper, we propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error. In particular, the pseudo-empirical best predictors of small area means and their corresponding mean squared prediction error estimates are studied. Performance of the proposed approach is evaluated through a simulation and also by a real data application. We propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89503682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}