{"title":"Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation","authors":"Tatsuya Asou, K. Eguchi","doi":"10.1145/1458449.1458467","DOIUrl":null,"url":null,"abstract":"This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model, and compared them from the viewpoints of log-likelihoods, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the other, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458449.1458467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model, and compared them from the viewpoints of log-likelihoods, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the other, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.