Conditional mixture modelling for heavy‐tailed and skewed data

IF 0.7 4区 数学 Q3 STATISTICS & PROBABILITY
Stat Pub Date : 2023-08-30 DOI:10.1002/sta4.608
Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu
{"title":"Conditional mixture modelling for heavy‐tailed and skewed data","authors":"Aqi Dong, Volodymyr Melnykov, Yang Wang, Xuwen Zhu","doi":"10.1002/sta4.608","DOIUrl":null,"url":null,"abstract":"Overparameterization is a serious concern for multivariate mixture models as it can lead to model overfitting and, as a result, mixture order underestimation. Parsimonious modelling is one of the most effective remedies in this context. In Gaussian mixture models, the majority of parameters is associated with covariance matrices and parsimonious models based on factor analysers and spectral decomposition of dispersion parameters are the most popular in literature. Some drawbacks of these models include the lack of flexibility in imposing different covariance structures for individual components and limitations in modelling compact clusters. Recently introduced conditional mixture models provide substantial flexibility in addressing these concerns. The components of such mixtures are formulated as a product of conditional distributions with univariate Gaussian densities being the primary choice. However, the presence of heavy tails or skewness in any dimension can lead to fitting problems. We propose a flexible model that is free of the above‐mentioned limitations and name it a contaminated transformation conditional mixture model and demonstrate on a series of simulation studies that it can effectively account for skewness and heavy tails. Applications to real‐life data sets show good results and highlight the promise of the proposed model.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"48 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.608","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Overparameterization is a serious concern for multivariate mixture models as it can lead to model overfitting and, as a result, mixture order underestimation. Parsimonious modelling is one of the most effective remedies in this context. In Gaussian mixture models, the majority of parameters is associated with covariance matrices and parsimonious models based on factor analysers and spectral decomposition of dispersion parameters are the most popular in literature. Some drawbacks of these models include the lack of flexibility in imposing different covariance structures for individual components and limitations in modelling compact clusters. Recently introduced conditional mixture models provide substantial flexibility in addressing these concerns. The components of such mixtures are formulated as a product of conditional distributions with univariate Gaussian densities being the primary choice. However, the presence of heavy tails or skewness in any dimension can lead to fitting problems. We propose a flexible model that is free of the above‐mentioned limitations and name it a contaminated transformation conditional mixture model and demonstrate on a series of simulation studies that it can effectively account for skewness and heavy tails. Applications to real‐life data sets show good results and highlight the promise of the proposed model.
重尾和偏态数据的条件混合建模
过度参数化是多变量混合模型的一个严重问题,因为它可能导致模型过拟合,从而导致混合顺序低估。在这种情况下,简约建模是最有效的补救措施之一。在高斯混合模型中,大多数参数与协方差矩阵相关,基于因子分析和色散参数谱分解的简约模型是文献中最流行的模型。这些模型的一些缺点包括在为单个组件施加不同协方差结构方面缺乏灵活性,以及在建模紧凑集群方面存在局限性。最近引入的条件混合模型为解决这些问题提供了很大的灵活性。这种混合物的成分被表述为条件分布的乘积,单变量高斯密度是主要选择。然而,在任何维度上出现重尾或偏态都可能导致拟合问题。我们提出了一个不受上述限制的灵活模型,并将其命名为污染变换条件混合模型,并在一系列仿真研究中证明它可以有效地解释偏态和重尾。应用于实际数据集显示出良好的结果,并突出了所提出模型的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Stat
Stat Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍: Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell. Stat is characterised by: • Speed - a high-quality review process that aims to reach a decision within 20 days of submission. • Concision - a maximum article length of 10 pages of text, not including references. • Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images. • Scope - addresses all areas of statistics and interdisciplinary areas. Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信