Australian & New Zealand Journal of Statistics最新文献_第8页

Minimum cost-compression risk in principal component analysis 主成分分析中的最小成本压缩风险

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-12-28 DOI: 10.1111/anzs.12378

Bhargab Chattopadhyay, Swarnali Banerjee

{"title":"Minimum cost-compression risk in principal component analysis","authors":"Bhargab Chattopadhyay, Swarnali Banerjee","doi":"10.1111/anzs.12378","DOIUrl":"10.1111/anzs.12378","url":null,"abstract":"<div>\u0000 \u0000 <p>Principal Component Analysis (PCA) is a popular multivariate analytic tool which can be used for dimension reduction without losing much information. Data vectors containing a large number of features arriving sequentially may be correlated with each other. An effective algorithm for such situations is online PCA. Existing Online PCA research works revolve around proposing efficient scalable updating algorithms focusing on compression loss only. They do not take into account the size of the dataset at which further arrival of data vectors can be terminated and dimension reduction can be applied. It is well known that the dataset size contributes to reducing the compression loss – the smaller the dataset size, the larger the compression loss while larger the dataset size, the lesser the compression loss. However, the reduction in compression loss by increasing dataset size will increase the total data collection cost. In this paper, we move beyond the scalability and updation problems related to Online PCA and focus on optimising a cost-compression loss which considers the compression loss and data collection cost. We minimise the corresponding risk using a two-stage PCA algorithm. The resulting two-stage algorithm is a fast and an efficient alternative to Online PCA and is shown to exhibit attractive convergence properties with no assumption on specific data distributions. Experimental studies demonstrate similar results and further illustrations are provided using real data. As an extension, a multi-stage PCA algorithm is discussed as well. Given the time complexity, the two-stage PCA algorithm is emphasised over the multi-stage PCA algorithm for online data.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"422-441"},"PeriodicalIF":1.1,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82020722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new minification integer-valued autoregressive process driven by explanatory variables 一种新的由解释变量驱动的最小化整数值自回归过程

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-12-28 DOI: 10.1111/anzs.12379

Lianyong Qian, Fukang Zhu

引用次数: 2

Small area estimation under a semi-parametric covariate measured with error 半参数协变量测量误差下的小面积估计

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-12-08 DOI: 10.1111/anzs.12377

Reyhane Sefidkar, Mahmoud Torabi, Amir Kavousi

{"title":"Small area estimation under a semi-parametric covariate measured with error","authors":"Reyhane Sefidkar, Mahmoud Torabi, Amir Kavousi","doi":"10.1111/anzs.12377","DOIUrl":"10.1111/anzs.12377","url":null,"abstract":"<div>\u0000 \u0000 <p>In recent years, small area estimation has played an important role in statistics as it deals with the problem of obtaining reliable estimates for parameters of interest in areas with small or even zero sample sizes corresponding to population sizes. Nested error linear regression models are often used in small area estimation assuming that the covariates are measured without error and also the relationship between covariates and response variable is linear. Small area models have also been extended to the case in which a linear relationship may not hold, using penalised spline (P-spline) regression, but assuming that the covariates are measured without error. Recently, a nested error regression model using a P-spline regression model, for the fixed part of the model, has been studied assuming the presence of measurement error in covariate, in the Bayesian framework. In this paper, we propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error. In particular, the pseudo-empirical best predictors of small area means and their corresponding mean squared prediction error estimates are studied. Performance of the proposed approach is evaluated through a simulation and also by a real data application. We propose a frequentist approach to study a semi-parametric nested error regression model using P-splines with a covariate measured with error.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"495-515"},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89503682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Permutation entropy and its variants for measuring temporal dependence 测量时间依赖性的排列熵及其变体

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-12-08 DOI: 10.1111/anzs.12376

Xin Huang, Han Lin Shang, David Pitt

{"title":"Permutation entropy and its variants for measuring temporal dependence","authors":"Xin Huang, Han Lin Shang, David Pitt","doi":"10.1111/anzs.12376","DOIUrl":"10.1111/anzs.12376","url":null,"abstract":"<p>Permutation entropy (PE) is an ordinal-based non-parametric complexity measure for studying the temporal dependence structure in a linear or non-linear time series. Based on the PE, we propose a new measure, namely permutation dependence (PD), to quantify the strength of the temporal dependence in a univariate time series and remedy the major drawbacks of PE. We demonstrate that the PE and PD are viable and useful alternatives to conventional temporal dependence measures, such as the autocorrelation function (ACF) and mutual information (MI). Compared to the ACF, the PE and PD are not restricted in detecting the linear or quasi-linear serial correlation in an autoregression model. Instead, they can be viewed as non-parametric and non-linear alternatives since they do not require any prior knowledge or assumptions about the underlying structure. Compared to MI estimated by <i>k</i>-nearest neighbour, PE and PD show added sensitivity to structures of relatively weak strength. We compare the finite-sample performance of the PE and PD with the ACF and the MI estimated by <i>k</i>-nearest neighbour in a number of simulation studies to showcase their respective strengths and weaknesses. Moreover, their performance under non-stationarity is also investigated. Using high-frequency EUR/USD exchange rate returns data, we apply the PE and PD to study the temporal dependence structure in intraday foreign exchange.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 4","pages":"442-477"},"PeriodicalIF":1.1,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12376","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76160251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The place of probability distributions in statistical learning. A commented book review of ‘Distributions for modeling location, scale, and shape using GAMLSS in R’ by Rigby et al. (2021) 概率分布在统计学习中的地位。Rigby等人对《在R中使用GAMLSS建模位置、规模和形状的分布》的书评(2021年)。

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-09-23 DOI: 10.1111/anzs.12374

Fernando Marmolejo-Ramos, Raydonal Ospina, Freddy Hernández-Barajas

引用次数: 0

Penalised, post-pretest, and post-shrinkage strategies in nonlinear growth models 非线性增长模型中的惩罚、后预测和后收缩策略

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-09-04 DOI: 10.1111/anzs.12373

Janjira Piladaeng, S. Ejaz Ahmed, Supranee Lisawadi

引用次数: 1

Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models 广义线性模型中快速穷尽特征重要性排序和选择的鲁棒减法稳定性测度

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-09-02 DOI: 10.1111/anzs.12375

Connor Smith, Boris Guennewig, Samuel Muller

{"title":"Robust subtractive stability measures for fast and exhaustive feature importance ranking and selection in generalised linear models","authors":"Connor Smith, Boris Guennewig, Samuel Muller","doi":"10.1111/anzs.12375","DOIUrl":"10.1111/anzs.12375","url":null,"abstract":"<p>We introduce the relatively new concept of subtractive lack-of-fit measures in the context of robust regression, in particular in generalised linear models. We devise a fast and robust feature selection framework for regression that empirically enjoys better performance than other selection methods while remaining computationally feasible when fully exhaustive methods are not. Our method builds on the concepts of model stability, subtractive lack-of-fit measures and repeated model identification. We demonstrate how the multiple implementations add value in a robust regression type context, in particular through utilizing a combination of robust regression coefficient and scale estimates. Through resampling, we construct a robust stability matrix, which contains multiple measures of feature importance for each variable. By constructing this stability matrix and using it to rank features based on importance, we are able to reduce the candidate model space and then perform an exhaustive search on the remaining models. We also introduce two different visualisations to better convey information held within the stability matrix; a subtractive Mosaic Probability Plot and a subtractive Variable Inclusion Plot. We demonstrate how these graphics allow for a better understanding of how variable importance changes under small alterations to the underlying data. Our framework is made available in <span>R</span> through the <span>RobStabR</span> package.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"339-355"},"PeriodicalIF":1.1,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90245712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis 基于主成分评分和独立成分分析潜在源的多元Kruskal_Wallis检验

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-08-04 DOI: 10.1111/anzs.12371

Amitava Mukherjee, Hidetoshi Murakami

{"title":"Multivariate Kruskal_Wallis tests based on principal component score and latent source of independent component analysis","authors":"Amitava Mukherjee, Hidetoshi Murakami","doi":"10.1111/anzs.12371","DOIUrl":"10.1111/anzs.12371","url":null,"abstract":"<div>\u0000 \u0000 <p>Analysing multivariate and high_dimensional multi_sample data is essential in many scientific fields. One of the most crucial and popular topics in modern nonparametric statistics is multi_sample comparison problems for such multivariate and high_dimensional data. The Kruskal_Wallis test is widely used in the multi_sample problem. For multivariate or high_dimensional data, it is imperative to specify how to determine the ranks of individual vector_valued observations in terms of various distance metrics. Alternatively, one can combine the concept of principal component scores or independent component scores with the Kruskal_Wallis test. A simple but powerful Kruskal_Wallis test based on the principal component scores is discussed in this paper for the multivariate and high_dimensional data. Another type of Kruskal_Wallis test based on latent sources of independent component analysis is constructed as a competitor. These tests are suitable for testing the difference in the location vector, scale matrix or both and can be used with equal and unequal sample sizes. These tests_ power performances are thoroughly compared with traditional distance_based Kruskal_Wallis tests for multivariate data using simulation based on Monte Carlo for various population distributions. We include an illustration of the proposed tests using real data. The paper concludes with some remarks and directions for future research.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 3","pages":"356-380"},"PeriodicalIF":1.1,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72631322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Festschrift for Geoff McLachlan 杰夫·麦克拉克伦的纪念

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-08-01 DOI: 10.1111/anzs.12372

Hien Nguyen, Sharon Lee, Florence Forbes

引用次数: 0

Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets 用于检测非正态聚类的贝叶斯层次混合模型应用于嘈杂的基因组和环境数据集

IF 1.1 4区数学

Australian & New Zealand Journal of Statistics Pub Date : 2022-08-01 DOI: 10.1111/anzs.12370

Huizi Zhang, Ben Swallow, Mayetri Gupta

{"title":"Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets","authors":"Huizi Zhang, Ben Swallow, Mayetri Gupta","doi":"10.1111/anzs.12370","DOIUrl":"10.1111/anzs.12370","url":null,"abstract":"<p>Clustering to find subgroups with common features is often a necessary first step in the statistical modelling and analysis of large and complex datasets. Although follow-up analyses often make use of complex statistical models that are appropriate for the specific application, most popular clustering approaches are either nonparametric, or based on Gaussian mixture models and their variants, often for reasons of computational efficiency. Certain characteristics in the data, such as the presence of outliers, or non-ellipsoidal cluster shapes, that are common in modern scientific datasets, often lead these methods to fail to detect the cluster components accurately. In this article, we present two efficient and robust Bayesian clustering approaches that seek to overcome these limitations—a model-based ‘tight’ clustering approach to cluster points in the presence of outliers, and a hierarchical Laplace mixture-based approach to cluster heavy-tailed and otherwise non-normal cluster components—and illustrate their power and accuracy in detecting meaningful clusters in datasets from genomics, imaging and the environmental sciences.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"64 2","pages":"313-337"},"PeriodicalIF":1.1,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12370","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83368306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1