{"title":"稀疏离散组学数据的概率模型识别","authors":"Hani Aldirawi, Jie Yang, Ahmed A. Metwally","doi":"10.1109/BHI.2019.8834661","DOIUrl":null,"url":null,"abstract":"Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.","PeriodicalId":281971,"journal":{"name":"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data\",\"authors\":\"Hani Aldirawi, Jie Yang, Ahmed A. Metwally\",\"doi\":\"10.1109/BHI.2019.8834661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.\",\"PeriodicalId\":281971,\"journal\":{\"name\":\"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BHI.2019.8834661\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI.2019.8834661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data
Modeling sparse and discrete omics data such as microbiome and transcriptomics is challenging due to the exceeding number of zeros. Many probabilistic models have been used, including Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. In this paper, we propose a statistical procedure for identifying the most appropriate discrete probabilistic models for zero-inflated or Hurdle models based on the p-value of the discrete Kolmogorov-Smirnov (KS) test. We develop a general procedure for estimating the parameters for a large class of zero-inflated models and Hurdle models. We also develop a general likelihood ratio test based on Neyman-Pearson lemma for choosing the best model when appropriate ones are more than one.