{"title":"Examining collinearities","authors":"Zillur R. Shabuz, Paul H. Garthwaite","doi":"10.1111/anzs.12425","DOIUrl":"10.1111/anzs.12425","url":null,"abstract":"<div>\u0000 \u0000 <p>The cos-max method is a little-known method of identifying collinearities. It is based on the cos-max transformation, which makes minimal adjustment to a set of vectors to create orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim of the transformation is that each vector should be close to the orthogonal component with which it is paired. Vectors involved in a collinearity must be adjusted substantially in order to create orthogonal components, while other vectors will typically be adjusted far less. The cos-max method uses the size of adjustments to identify collinearities. It gives a coherent relationship between collinear sets of variables and variance inflation factors (VIFs) and identifies collinear sets using more information than traditional methods. In this paper we describe these features of the method and examine its performance in examples, comparing it with alternative methods. In each example, the collinearities identified by the cos-max method only contained variables with high VIFs and contained all variables with high VIFs. The collinearities identified by other methods did not have such a close link to VIFs. Also, the collinearities identified by the cos-max method were as simple as or simpler than those given by other methods, with less overlap between collinearities in the variables that they contained.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact samples sizes for clinical trials subject to size and power constraints","authors":"Chris J. Lloyd","doi":"10.1111/anzs.12424","DOIUrl":"10.1111/anzs.12424","url":null,"abstract":"<p>This paper first describes the difficulties in providing the required sample sizes for clinical trials that guarantee type 1 and type 2 error control. The required sample sizes obviously depend on the test employed, and in this study we use the so-called <i>E</i>-test, which is known to have extremely favourable size properties and higher power than alternatives. To compute exact powers for this test in real time is not currently feasible, so a corpus of pre-computed exact powers (and sizes) was created, covering sample sizes up to 500. When there are no solutions within the corpus, a novel extrapolation technique is used. Exact size can be computed after the sample sizes have been extracted; however, for the <i>E</i>-test the exact size is virtually always very close to the nominal target. All the code has been converted into an <span>R-package</span>, which is available on CRAN and illustrated.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data","authors":"Xiao Zhang","doi":"10.1111/anzs.12421","DOIUrl":"10.1111/anzs.12421","url":null,"abstract":"<p>Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis–Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12421","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos
{"title":"Distributional modelling of positively skewed data via the flexible Weibull extension distribution","authors":"Freddy Hernández-Barajas, Olga Usuga-Manco, Carmen Patino-Rodríguez, Fernando Marmolejo-Ramos","doi":"10.1111/anzs.12423","DOIUrl":"10.1111/anzs.12423","url":null,"abstract":"<p>The time until an event occurs is often known to have a skewed distribution. To model this, a statistical distribution called the two-parameter flexible Weibull extension (FWE) has been proposed. In this paper, the FWE distribution is used to model datasets through the use of generalised additive models for location, scale and shape (GAMLSS) distributional regression. GAMLSS is the only regression technique that can examine the effects of both categorical and numeric predictors on all the parameters of the distribution used to fit the dependent variable. To make it easier to use the FWE distribution through GAMLSS, the <span>RelDists</span> R package is proposed. A simulation study shows that FWE modelling through GAMLSS provides reliable parameter estimates even in the presence of factors that affect the distribution.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min
{"title":"Spline linear mixed-effects models for causal mediation analysis with longitudinal data","authors":"Jeffrey M. Albert, Hongxu Zhu, Tanujit Dey, Jiayang Sun, Wojbor A. Woyczynski, Gregory Powers, Meeyoung Min","doi":"10.1111/anzs.12422","DOIUrl":"10.1111/anzs.12422","url":null,"abstract":"<div>\u0000 \u0000 <p>Often, causal mediation analysis is of interest when both the mediator and the final outcome are repeatedly measured, but limited work has been done for this situation (as opposed to where only the mediator is repeatedly measured). Available methods are primarily based on parametric models and tend to be sensitive to model assumptions. This article presents semiparametric, continuous-time models to provide a flexible and robust approach to causal mediation analysis for longitudinal data, which allows these data to be unbalanced or irregular. Specifically, the method uses spline linear mixed-effects models for the mediator and for the final outcome, with a two-step approach to model-fitting in which a predicted mediator is used as a covariate in the final outcome model. The models allow flexible functions for both the mean and individual response functions for each outcome. We derive estimated natural direct and indirect effects as a function of time using an extended mediation formula and sequential ignorability assumption. In simulation studies, we compare properties of estimated direct and indirect effects, and a delta method estimate of the standard error of the latter, under alternative approaches for predicting the mediator. The approach is illustrated using harmonised data from two cohort studies to examine attention as a mediator of the effect of prenatal tobacco exposure on externalising behaviour in children.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma
{"title":"A new robust covariance matrix estimation for high-dimensional microbiome data","authors":"Jiyang Wang, Wanfeng Liang, Lijie Li, Yue Wu, Xiaoyan Ma","doi":"10.1111/anzs.12415","DOIUrl":"10.1111/anzs.12415","url":null,"abstract":"<div>\u0000 \u0000 <p>Microbiome data typically lie in a high-dimensional simplex. One of the key questions in metagenomic analysis is to exploit the covariance structure for this kind of data. In this paper, a framework called approximate-estimate-threshold (AET) is developed for the robust basis covariance estimation for high-dimensional microbiome data. To be specific, we first construct a proxy matrix <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>, which is almost indistinguishable from the real basis covariance matrix <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Sigma} $$</annotation>\u0000 </semantics></math>. Then, any estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> satisfying some conditions can be used to estimate <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <annotation>$$ boldsymbol{Gamma} $$</annotation>\u0000 </semantics></math>. Finally, we impose a thresholding step on <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math> to obtain the final estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>∑</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Sigma}} $$</annotation>\u0000 </semantics></math>. In particular, this paper applies a Huber-type estimator <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mover>\u0000 <mrow>\u0000 <mi>Γ</mi>\u0000 </mrow>\u0000 <mo>^</mo>\u0000 </mover>\u0000 </mrow>\u0000 <annotation>$$ hat{boldsymbol{Gamma}} $$</annotation>\u0000 </semantics></math>, and achieves robustness by only requiring the boundedness of 2+<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>ϵ</mi>\u0000 </mrow>\u0000 <a","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141190779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Testing multiple dispersion effects from unreplicated order-of-addition experiments","authors":"Shin-Fu Tsai, Shan-Syue He","doi":"10.1111/anzs.12416","DOIUrl":"10.1111/anzs.12416","url":null,"abstract":"<p>Optimal addition orders of several components can be determined systematically to address order-of-addition problems when active location and dispersion effects are both taken into account. Based on the concept of fiducial generalised pivotal quantities, a new testing procedure is proposed in this paper to identify active dispersion effects from unreplicated order-of-addition experiments. Because the proposed method is free of all nuisance parameters indexed by the requirement set, it is capable of testing multiple dispersion effects. Simulation results show that the proposed method can maintain the empirical sizes close to the nominal level. A paint viscosity study is used to show that the proposed method can be practical. In addition, testable requirement sets are characterised when an order-of-addition orthogonal array is used to design an experiment.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12416","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141104106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A calibrated data-driven approach for small area estimation using big data","authors":"Siu-Ming Tam, Shaila Sharmeen","doi":"10.1111/anzs.12414","DOIUrl":"10.1111/anzs.12414","url":null,"abstract":"<div>\u0000 \u0000 <p>Where the response variable in a big dataset is consistent with the variable of interest for small area estimation, the big data by itself can provide the estimates for small areas. These estimates are often subject to the coverage and measurement error bias inherited from the big data. However, if a probability survey of the same variable of interest is available, the survey data can be used as a training dataset to develop an algorithm to impute for the data missed by the big data and adjust for measurement errors. In this paper, we outline a methodology for such imputations based on an <i>k</i>-nearest neighbours (kNN) algorithm calibrated to an asymptotically design-unbiased estimate of the national total, and illustrate the use of a training dataset to estimate the imputation bias and the “fixed-<i>k</i> asymptotic” bootstrap to estimate the variance of the small area hybrid estimator. We illustrate the methodology of this paper using a public-use dataset and use it to compare the accuracy and precision of our hybrid estimator with the Fay–Harriot (FH) estimator. Finally, we also examine numerically the accuracy and precision of the FH estimator when the auxiliary variables used in the linking models are subject to undercoverage errors.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate inferences for Bayesian hierarchical generalised linear regression models","authors":"Brandon Berman, Wesley O. Johnson, Weining Shen","doi":"10.1111/anzs.12412","DOIUrl":"10.1111/anzs.12412","url":null,"abstract":"<div>\u0000 \u0000 <p>Generalised linear mixed regression models are fundamental in statistics. Modelling random effects that are shared by individuals allows for correlation among those individuals. There are many methods and statistical packages available for analysing data using these models. Most require some form of numerical or analytic approximation because the likelihood function generally involves intractable integrals over the latents. The Bayesian approach avoids this issue by iteratively sampling the full conditional distributions for various blocks of parameters and latent random effects. Depending on the choice of the prior, some full conditionals are recognisable while others are not. In this paper we develop a novel normal approximation for the random effects full conditional, establish its asymptotic correctness and evaluate how well it performs. We make the case for hierarchical binomial and Poisson regression models with canonical link functions, for hierarchical gamma regression models with log link and for other cases. We also develop what we term a sufficient reduction (SR) approach to the Markov Chain Monte Carlo algorithm that allows for making inferences about all model parameters by replacing the full conditional for the latent variables with a considerably reduced dimensional function of the latents. We expect that this approximation could be quite useful in situations where there are a very large number of latent effects, which may be occurring in an increasingly ‘Big Data’ world. In the sequel, we compare our methods with INLA, which is a particularly popular method and which has been shown to be excellent in terms of speed and accuracy across a variety of settings. Our methods appear to be comparable to theirs in terms of accuracy, while INLA was faster, for the settings we considered. In addition, we note that our methods and those of others that involve Gibbs sampling trivially handle parameters that are functions of multiple parameters, while INLA approximations do not. Our primary illustration is for a three-level hierarchical binomial regression model for data on health outcomes for patients who are clustered within physicians who are clustered within particular hospitals or hospital systems.</p>\u0000 </div>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan
{"title":"Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R","authors":"Ziyang Lyu, Daniel Ahfock, Ryan Thompson, Geoffrey J. McLachlan","doi":"10.1111/anzs.12413","DOIUrl":"10.1111/anzs.12413","url":null,"abstract":"<p>Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present <span>gmmsslm</span>, an <span>R</span> package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the pre-defined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.</p>","PeriodicalId":55428,"journal":{"name":"Australian & New Zealand Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/anzs.12413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}