Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links

Handbook of Mixed Membership Models and Their Applications Pub Date : 1900-01-01 DOI:10.1201/b17520-17

Ramnath Balasubramanyan, William W. Cohen

引用次数: 143

Abstract

Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked. We apply the model to two datasets: a protein-protein interaction (PPI) dataset supplemented with a corpus of abstracts of scientific publications annotated with the proteins in the PPI dataset and an Enron email corpus. The model is evaluated by inspecting induced topics to understand the nature of the data and by quantitative methods such as functional category prediction of proteins and perplexity which exhibit improvements when joint modeling is used over baselines that use only link or text information.

查看原文本刊更多论文

块lda:联合建模实体-注释文本和实体-实体链接

从观察到的实体对之间的相互作用中识别潜在的实体组是在蛋白质相互作用和社会网络分析等领域经常遇到的问题。我们提出了一个模型，该模型结合了混合隶属度随机块模型和主题模型的各个方面，通过联合建模链接和关于链接实体的文本来改进实体-实体链接建模。我们将该模型应用于两个数据集:一个是蛋白质-蛋白质相互作用(PPI)数据集，其中补充了用PPI数据集中的蛋白质注释的科学出版物摘要语料库，另一个是安然电子邮件语料库。通过检查诱导主题来了解数据的性质，并通过定量方法(如蛋白质和困惑的功能类别预测)来评估模型，当在仅使用链接或文本信息的基线上使用联合建模时，这些方法表现出改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Handbook of Mixed Membership Models and Their Applications

自引率

0.00%

发文量