Multi-source Representation Enhancement for Wikipedia-style Entity Annotation

Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang
{"title":"Multi-source Representation Enhancement for Wikipedia-style Entity Annotation","authors":"Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang","doi":"10.1109/IJCNN55064.2022.9892289","DOIUrl":null,"url":null,"abstract":"Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.
维基百科式实体标注的多源表示增强
维基百科中的实体注释(官方命名为wikilinks)极大地造福了人类最终用户。人类编辑需要选择所有对人类最终用户最有帮助的提及,并将每个提及链接到维基百科页面。我们的目标是设计一个自动系统,为任何纯文本生成维基百科风格的实体注释。然而,现有的研究要么严重依赖于提及实体图,要么仅限于命名实体。此外,他们忽略了按照维基百科的要求选择适当的提及。因此,他们省略了一些必要的注释,并引入了过多的分散注意力的注释。现有的基准也绕过了覆盖范围和选择问题。为了更好地反映标注质量,我们提出了一个名为提及检测和选择的实体标注任务,以及一个新的基准WikiC。该任务是在高质量的人工注释示例中对特定于每个位置的提及进行集中。我们还提出了一个新的框架,DrWiki,来完成这个任务。我们采用深度预训练的跨度选择模型,通过标记的上下文嵌入直接从纯文本推断。它可以覆盖所有可能的跨度,避免局限于提及实体映射。此外,引入了无可争议的提及实体对信息和提及重复信息,分别通过FLAT关注和重复嵌入作为标记智能表示增强。WikiC上的实证结果表明,与常用的实体链接和实体识别方法相比,我们的方法在整体性能上比以前的方法有所提高。额外的实验表明,即使使用低覆盖率的提及实体图,DrWiki也能获得改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信