基于主题属性的微生物组数据贝叶斯非参数主题模型

Q3 Biochemistry, Genetics and Molecular Biology
T. Okui
{"title":"基于主题属性的微生物组数据贝叶斯非参数主题模型","authors":"T. Okui","doi":"10.2197/ipsjtbio.13.1","DOIUrl":null,"url":null,"abstract":": Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Bayesian Nonparametric Topic Model for Microbiome Data Using Subject Attributes\",\"authors\":\"T. Okui\",\"doi\":\"10.2197/ipsjtbio.13.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.\",\"PeriodicalId\":38959,\"journal\":{\"name\":\"IPSJ Transactions on Bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtbio.13.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtbio.13.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 4

摘要

近年来,微生物组数据的获取相对容易,目前,各种分析微生物组数据的方法正在被提出。潜在狄利克雷分配(Latent Dirichlet allocation, LDA)模型经常用于从文档中提取潜在主题,也被用于从微生物组数据中提取微生物群落信息。为了提取与受试者属性相关的微生物组主题,可以使用利用监督信息的LDA模型,包括Dirichlet多项式回归的LDA (DMR主题模型)或监督主题模型(SLDA)。此外,贝叶斯非参数模型通常用于自动确定潜在变量模型的潜在类数。LDA也可以用层次狄利克雷过程扩展为贝叶斯非参数模型。虽然之前已经提出了一种贝叶斯非参数DMR主题模型,但该模型使用归一化伽玛过程生成主题分布,并且不知道是否可以从数据中自动确定主题的数量。期望通过断棒过程生成主题分布,将主题总数(比例相对较大)限制在一个较小的值。因此,我们提出了一个使用断棒过程的贝叶斯非参数DMR主题模型,并将其与使用两组真实微生物组数据的现有模型进行了比较。结果表明,与现有方法相比,该模型可以提取出与主题属性关联更大的主题,并能自动从数据中确定主题的数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Bayesian Nonparametric Topic Model for Microbiome Data Using Subject Attributes
: Microbiome data have been obtained relatively easily in recent years, and currently, various methods for analyzing microbiome data are being proposed. Latent Dirichlet allocation (LDA) models, which are frequently used to extract latent topics from words in documents, have also been proposed to extract information on microbial com- munities for microbiome data. To extract microbiome topics associated with a subject’s attributes, LDA models that utilize supervisory information, including LDA with Dirichlet multinomial regression (DMR topic model) or super- vised topic model (SLDA,) can be applied. Further, a Bayesian nonparametric model is often used to automatically decide the number of latent classes for a latent variable model. An LDA can also be extended to a Bayesian nonpara- metric model using the hierarchical Dirichlet process. Although a Bayesian nonparametric DMR topic model has been previously proposed, it uses normalized gamma process for generating topic distribution, and it is unknown whether the number of topics can be automatically decided from data. It is expected that the total number of topics (with relatively large proportions) can be restricted to a smaller value using the stick-breaking process for generating topic distribution. Therefore, we propose a Bayesian nonparametric DMR topic model using a stick-breaking process and have compared it to existing models using two sets of real microbiome data. The results showed that the proposed model could extract topics that were more associated with attributes of a subject than existing methods, and it could automatically decide the number of topics from the data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IPSJ Transactions on Bioinformatics
IPSJ Transactions on Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (miscellaneous)
CiteScore
1.90
自引率
0.00%
发文量
3
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信