分布尾部的有限混合模型：蒙特卡罗实验和经验应用

IF 2.1 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Statistical Analysis and Data Mining Pub Date : 2024-03-28 DOI:10.1002/sam.11671

Marilena Furno, Francesco Caracciolo

{"title":"分布尾部的有限混合模型：蒙特卡罗实验和经验应用","authors":"Marilena Furno, Francesco Caracciolo","doi":"10.1002/sam.11671","DOIUrl":null,"url":null,"abstract":"The finite mixture model estimates regression coefficients distinct in each of the different groups of the dataset endogenously determined by this estimator. In what follows the analysis is extended beyond the mean, estimating the model in the tails of the conditional distribution of the dependent variable within each group. While the clustering reduces the overall heterogeneity, since the model is estimated for groups of similar observations, the analysis in the tails uncovers within groups heterogeneity and/or skewness. By integrating the endogenously determined clustering with the quantile regression analysis within each group, enhances the finite mixture models and focuses on the tail behavior of the conditional distribution of the dependent variable. A Monte Carlo experiment and two empirical applications conclude the analysis. In the well‐known birthweight dataset, the finite mixture model identifies and computes the regression coefficients of different groups, each one with its own characteristics, both at the mean and in the tails. In the family expenditure data, the analysis of within and between groups heterogeneity provides interesting economic insights on price elasticities. The analysis in classes proves to be more efficient than the model estimated without clustering. By extending the finite mixture approach to the tails provides a more accurate investigation of the data, introducing a robust tool to unveil sources of within groups heterogeneity and asymmetry otherwise left undetected. It improves efficiency and explanatory power with respect to the standard OLS‐based FMM.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications\",\"authors\":\"Marilena Furno, Francesco Caracciolo\",\"doi\":\"10.1002/sam.11671\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The finite mixture model estimates regression coefficients distinct in each of the different groups of the dataset endogenously determined by this estimator. In what follows the analysis is extended beyond the mean, estimating the model in the tails of the conditional distribution of the dependent variable within each group. While the clustering reduces the overall heterogeneity, since the model is estimated for groups of similar observations, the analysis in the tails uncovers within groups heterogeneity and/or skewness. By integrating the endogenously determined clustering with the quantile regression analysis within each group, enhances the finite mixture models and focuses on the tail behavior of the conditional distribution of the dependent variable. A Monte Carlo experiment and two empirical applications conclude the analysis. In the well‐known birthweight dataset, the finite mixture model identifies and computes the regression coefficients of different groups, each one with its own characteristics, both at the mean and in the tails. In the family expenditure data, the analysis of within and between groups heterogeneity provides interesting economic insights on price elasticities. The analysis in classes proves to be more efficient than the model estimated without clustering. By extending the finite mixture approach to the tails provides a more accurate investigation of the data, introducing a robust tool to unveil sources of within groups heterogeneity and asymmetry otherwise left undetected. It improves efficiency and explanatory power with respect to the standard OLS‐based FMM.\",\"PeriodicalId\":48684,\"journal\":{\"name\":\"Statistical Analysis and Data Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11671\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11671","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

有限混合物模型估计了由该估计器内生决定的数据集不同组别中各不相同的回归系数。接下来的分析将超越平均值，对每个组内因变量条件分布的尾部进行估计。虽然聚类减少了整体异质性，因为模型是针对相似观测值的组进行估计的，但尾部分析揭示了组内异质性和/或偏斜性。通过将内生决定的聚类与各组内的量子回归分析相结合，增强了有限混合物模型，并将重点放在因变量条件分布的尾部行为上。最后，通过蒙特卡罗实验和两个实证应用进行了分析。在著名的出生体重数据集中，有限混合模型识别并计算了不同组别的回归系数，每个组别在均值和尾部都有自己的特点。在家庭支出数据中，对组内和组间异质性的分析为价格弹性提供了有趣的经济学启示。事实证明，分组分析比不分组的估计模型更有效。通过将有限混合物方法扩展到尾部，对数据进行了更准确的调查，引入了一种强有力的工具来揭示组内异质性和非对称性的来源，否则就无法发现。与基于 OLS 的标准 FMM 相比，它提高了效率和解释力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications

The finite mixture model estimates regression coefficients distinct in each of the different groups of the dataset endogenously determined by this estimator. In what follows the analysis is extended beyond the mean, estimating the model in the tails of the conditional distribution of the dependent variable within each group. While the clustering reduces the overall heterogeneity, since the model is estimated for groups of similar observations, the analysis in the tails uncovers within groups heterogeneity and/or skewness. By integrating the endogenously determined clustering with the quantile regression analysis within each group, enhances the finite mixture models and focuses on the tail behavior of the conditional distribution of the dependent variable. A Monte Carlo experiment and two empirical applications conclude the analysis. In the well‐known birthweight dataset, the finite mixture model identifies and computes the regression coefficients of different groups, each one with its own characteristics, both at the mean and in the tails. In the family expenditure data, the analysis of within and between groups heterogeneity provides interesting economic insights on price elasticities. The analysis in classes proves to be more efficient than the model estimated without clustering. By extending the finite mixture approach to the tails provides a more accurate investigation of the data, introducing a robust tool to unveil sources of within groups heterogeneity and asymmetry otherwise left undetected. It improves efficiency and explanatory power with respect to the standard OLS‐based FMM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.