块lda:联合建模实体-注释文本和实体-实体链接

Ramnath Balasubramanyan, William W. Cohen
{"title":"块lda:联合建模实体-注释文本和实体-实体链接","authors":"Ramnath Balasubramanyan, William W. Cohen","doi":"10.1201/b17520-17","DOIUrl":null,"url":null,"abstract":"Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked. We apply the model to two datasets: a protein-protein interaction (PPI) dataset supplemented with a corpus of abstracts of scientific publications annotated with the proteins in the PPI dataset and an Enron email corpus. The model is evaluated by inspecting induced topics to understand the nature of the data and by quantitative methods such as functional category prediction of proteins and perplexity which exhibit improvements when joint modeling is used over baselines that use only link or text information.","PeriodicalId":347179,"journal":{"name":"Handbook of Mixed Membership Models and Their Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"143","resultStr":"{\"title\":\"Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links\",\"authors\":\"Ramnath Balasubramanyan, William W. Cohen\",\"doi\":\"10.1201/b17520-17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked. We apply the model to two datasets: a protein-protein interaction (PPI) dataset supplemented with a corpus of abstracts of scientific publications annotated with the proteins in the PPI dataset and an Enron email corpus. The model is evaluated by inspecting induced topics to understand the nature of the data and by quantitative methods such as functional category prediction of proteins and perplexity which exhibit improvements when joint modeling is used over baselines that use only link or text information.\",\"PeriodicalId\":347179,\"journal\":{\"name\":\"Handbook of Mixed Membership Models and Their Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"143\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Handbook of Mixed Membership Models and Their Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1201/b17520-17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Handbook of Mixed Membership Models and Their Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b17520-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 143

摘要

从观察到的实体对之间的相互作用中识别潜在的实体组是在蛋白质相互作用和社会网络分析等领域经常遇到的问题。我们提出了一个模型,该模型结合了混合隶属度随机块模型和主题模型的各个方面,通过联合建模链接和关于链接实体的文本来改进实体-实体链接建模。我们将该模型应用于两个数据集:一个是蛋白质-蛋白质相互作用(PPI)数据集,其中补充了用PPI数据集中的蛋白质注释的科学出版物摘要语料库,另一个是安然电子邮件语料库。通过检查诱导主题来了解数据的性质,并通过定量方法(如蛋白质和困惑的功能类别预测)来评估模型,当在仅使用链接或文本信息的基线上使用联合建模时,这些方法表现出改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links
Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked. We apply the model to two datasets: a protein-protein interaction (PPI) dataset supplemented with a corpus of abstracts of scientific publications annotated with the proteins in the PPI dataset and an Enron email corpus. The model is evaluated by inspecting induced topics to understand the nature of the data and by quantitative methods such as functional category prediction of proteins and perplexity which exhibit improvements when joint modeling is used over baselines that use only link or text information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信