{"title":"基于主题属性的微生物组数据贝叶斯非参数主题模型","authors":"T. Okui","doi":"10.2197/ipsjtbio.13.1","DOIUrl":null,"url":null,"abstract":": Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"142 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Bayesian Nonparametric Topic Model for Microbiome Data Using Subject Attributes\",\"authors\":\"T. Okui\",\"doi\":\"10.2197/ipsjtbio.13.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.\",\"PeriodicalId\":38959,\"journal\":{\"name\":\"IPSJ Transactions on Bioinformatics\",\"volume\":\"142 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtbio.13.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtbio.13.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
A Bayesian Nonparametric Topic Model for Microbiome Data Using Subject Attributes
: Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.