Multi-source named entity typing for social media

NEWS@ACM Pub Date : 2016-08-01 DOI:10.18653/v1/W16-2702

R. Vexler, Einat Minkov

引用次数: 2

Abstract

Typed lexicons that encode knowledge about the semantic types of an entity name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack coverage with respect to long tail entity names. We infer the types of a given entity name using multi-source learning, considering information obtained by alignment to the Freebase knowledge base, Web-scale distributional patterns, and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.

查看原文本刊更多论文

用于社交媒体的多源命名实体类型

对实体名称的语义类型的知识进行编码的类型化词汇(例如，“Paris”表示地理位置、产品或人)已被证明对许多文本处理任务很有用。虽然词典可能来源于大规模知识库(KBs)，但知识库本身是不完善的，特别是它们缺乏对长尾实体名称的覆盖。我们使用多源学习推断给定实体名称的类型，考虑到通过与Freebase知识库对齐获得的信息、网络规模的分布模式和通过网络搜索检索到的全球半结构化上下文。在社交媒体领域的评估表明，与基于规则的知识库查找相比，多源学习提高了性能，提高了某些语义类别的输入结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NEWS@ACM

自引率

0.00%

发文量