稀疏离散组学数据的概率模型识别

2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) Pub Date : 2019-05-01 DOI:10.1109/BHI.2019.8834661

Hani Aldirawi, Jie Yang, Ahmed A. Metwally

{"title":"稀疏离散组学数据的概率模型识别","authors":"Hani Aldirawi, Jie Yang, Ahmed A. Metwally","doi":"10.1109/BHI.2019.8834661","DOIUrl":null,"url":null,"abstract":"Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.","PeriodicalId":281971,"journal":{"name":"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data\",\"authors\":\"Hani Aldirawi, Jie Yang, Ahmed A. Metwally\",\"doi\":\"10.1109/BHI.2019.8834661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.\",\"PeriodicalId\":281971,\"journal\":{\"name\":\"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BHI.2019.8834661\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI.2019.8834661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

建模稀疏和离散组学数据，如微生物组学和转录组学是具有挑战性的，因为超过数量的零。使用了许多概率模型，包括泊松模型、负二项模型、零膨胀泊松模型和零膨胀负二项模型。在本文中，我们提出了一种基于离散Kolmogorov-Smirnov (KS)检验的p值来识别零膨胀模型或障碍模型最合适的离散概率模型的统计过程。我们开发了一种估计大量零膨胀模型和障碍模型参数的一般方法。我们还开发了一个基于Neyman-Pearson引理的一般似然比检验，用于在合适的模型多于一个时选择最佳模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data

Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)

自引率

0.00%

发文量