{"title":"PanIC: Consistent information criteria for general model selection problems","authors":"Hien Duy Nguyen","doi":"10.1111/anzs.12426","DOIUrl":"https://doi.org/10.1111/anzs.12426","url":null,"abstract":"<p>Model selection is a ubiquitous problem that arises in the application of many statistical and machine learning methods. In the likelihood and related settings, it is typical to use the method of information criteria (ICs) to choose the most parsimonious among competing models by penalizing the likelihood-based objective function. Theorems guaranteeing the consistency of ICs can often be difficult to verify and are often specific and bespoke. We present a set of results that guarantee consistency for a class of ICs, which we call PanIC (from the Greek root ‘<i>pan</i>’, meaning ‘<i>of everything</i>’), with easily verifiable regularity conditions. PanICs are applicable in any loss-based learning problem and are not exclusive to likelihood problems. We illustrate the verification of regularity conditions for model selection problems regarding finite mixture models, least absolute deviation and support vector regression and principal component analysis, and demonstrate the effectiveness of PanICs for such problems via numerical simulations. Furthermore, we present new sufficient conditions for the consistency of BIC-like estimators and provide comparisons of the BIC with PanIC.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"441-466"},"PeriodicalIF":0.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction de-correlated inference: A safe approach for post-prediction inference","authors":"Feng Gan, Wanfeng Liang, Changliang Zou","doi":"10.1111/anzs.12429","DOIUrl":"https://doi.org/10.1111/anzs.12429","url":null,"abstract":"<div>\u0000 \u0000 <p>In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabelled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called prediction de-correlated inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"417-440"},"PeriodicalIF":0.8,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Telling Stories with Data: With Application in R. By Rohan Alexander. CRC Press. 2023. 622 pages. AU$129.60 (hardback). ISBN: 978-1-0321-3477-2.","authors":"Emi Tanaka","doi":"10.1111/anzs.12428","DOIUrl":"https://doi.org/10.1111/anzs.12428","url":null,"abstract":"","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"467-470"},"PeriodicalIF":0.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full Bayesian analysis of triple seasonal autoregressive models","authors":"Ayman A. Amin","doi":"10.1111/anzs.12427","DOIUrl":"https://doi.org/10.1111/anzs.12427","url":null,"abstract":"<div>\u0000 \u0000 <p>Seasonal autoregressive (SAR) time series models have been extended to fit time series exhibiting multiple seasonalities. However, hardly any research in Bayesian literature has been done on modelling multiple seasonalities. In this article, we propose a full Bayesian analysis of triple SAR (TSAR) models for time series with triple seasonality, considering identification, estimation and prediction for these TSAR models. In this Bayesian analysis of TSAR models, we assume the model errors to be normally distributed and the model order to be a random variable with a known maximum value, and we employ the g prior for the model coefficients and variance. Accordingly, we first derive the posterior mass function of the TSAR order in closed form, which then enables us to identify the best order of TSAR model as the order value with the highest posterior probability. In addition, we derive the conditional posteriors to be a multivariate normal for the TSAR coefficients and to be an inverse gamma for the TSAR variance; also, we derive the conditional predictive distribution to be a multivariate normal for future observations. Since these derived conditional distributions are in closed forms, we introduce the Gibbs sampler to present the Bayesian analysis of TSAR models and to easily produce multiple-step-ahead predictions. Using <span>Julia</span> programming language, we conduct an extensive simulation study, aiming to evaluate the accuracy of our proposed full Bayesian analysis for TSAR models. In addition, we apply our work on time series to hourly electricity load in some European countries.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 4","pages":"389-416"},"PeriodicalIF":0.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining collinearities","authors":"Zillur R. Shabuz, Paul H. Garthwaite","doi":"10.1111/anzs.12425","DOIUrl":"10.1111/anzs.12425","url":null,"abstract":"<div>\u0000 \u0000 <p>The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"367-388"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact samples sizes for clinical trials subject to size and power constraints","authors":"Chris J. Lloyd","doi":"10.1111/anzs.12424","DOIUrl":"10.1111/anzs.12424","url":null,"abstract":"<p>This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called <i>E</i>-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the <i>E</i>-test the exact size is virtually always very close to the nominal target. All the code has been converted into an <span>R-package</span>, which is available on CRAN and illustrated.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"297-305"},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data","authors":"Xiao Zhang","doi":"10.1111/anzs.12421","DOIUrl":"10.1111/anzs.12421","url":null,"abstract":"<p>Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"325-346"},"PeriodicalIF":0.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos
{"title":"Distributional modelling of positively skewed data via the flexible Weibull extension distribution","authors":"Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos","doi":"10.1111/anzs.12423","DOIUrl":"10.1111/anzs.12423","url":null,"abstract":"<p>The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the <span>RelDists</span> R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"306-324"},"PeriodicalIF":0.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min
{"title":"Spline linear mixed-effects models for causal mediation analysis with longitudinal data","authors":"Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min","doi":"10.1111/anzs.12422","DOIUrl":"10.1111/anzs.12422","url":null,"abstract":"<div>\u0000 \u0000 <p>Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 3","pages":"347-366"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma
{"title":"A new robust covariance matrix estimation for high-dimensional microbiome data","authors":"Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma","doi":"10.1111/anzs.12415","DOIUrl":"10.1111/anzs.12415","url":null,"abstract":"<div>\u0000 \u0000 <p>Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Sigma} $$</annotation>\u0000 </semantics></math>. Then, any estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> satisfying some conditions can be used to estimate <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>. Finally, we impose a thresholding step on <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> to obtain the final estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation>\u0000 </semantics></math>. In particular, this paper applies a Huber-type estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>ϵ</mi>\u0000 </mrow>\u0000 <a","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":"66 2","pages":"281-295"},"PeriodicalIF":1.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}