混合模型在双重膨胀计数数据中的应用

Big data analytics Pub Date : 2023-03-11 DOI:10.3390/analytics2010014

Monika Arora, N. Chaganty

{"title":"混合模型在双重膨胀计数数据中的应用","authors":"Monika Arora, N. Chaganty","doi":"10.3390/analytics2010014","DOIUrl":null,"url":null,"abstract":"In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Mixture Models for Doubly Inflated Count Data\",\"authors\":\"Monika Arora, N. Chaganty\",\"doi\":\"10.3390/analytics2010014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.\",\"PeriodicalId\":93078,\"journal\":{\"name\":\"Big data analytics\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big data analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/analytics2010014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/analytics2010014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在卫生和社会科学以及计数数据分析很重要的其他领域，当零计数的频率很高(膨胀)时，采用零膨胀模型。由于多种原因，在某些情况下，经常会出现k > 0的附加计数值。零膨胀和k膨胀的泊松分布模型(ZkIP)更适合于这种情况。ZkIP模型是由三个组成部分组成的混合分布:0和k计数的简并分布和泊松分布。在本文中，我们提出了一种替代和计算速度快的期望最大化(EM)算法来获得分组零和k膨胀计数数据的参数估计。用完全数据法推导了渐近标准误差。我们将零膨胀和k膨胀的泊松模型与零膨胀和非膨胀的泊松模型进行比较。根据常用标准选择最佳模型。理论结果补充了来自健康科学的两个现实数据集的分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of Mixture Models for Doubly Inflated Count Data

In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Big data analytics

自引率

0.00%

发文量

审稿时长

5 weeks