资源贫乏语言的命名实体消歧

Mohamed H. Gad-Elrab, M. Yosef, G. Weikum
{"title":"资源贫乏语言的命名实体消歧","authors":"Mohamed H. Gad-Elrab, M. Yosef, G. Weikum","doi":"10.1145/2810133.2810138","DOIUrl":null,"url":null,"abstract":"Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.","PeriodicalId":298747,"journal":{"name":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Named Entity Disambiguation for Resource-Poor Languages\",\"authors\":\"Mohamed H. Gad-Elrab, M. Yosef, G. Weikum\",\"doi\":\"10.1145/2810133.2810138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.\",\"PeriodicalId\":298747,\"journal\":{\"name\":\"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2810133.2810138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2810133.2810138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

命名实体消歧(NED)是将自然语言文本中的歧义名称与注册在知识库中的规范实体(如人员、组织或地点)链接起来的任务。这个问题在英语文本中得到了很好的研究,但是很少有系统考虑缺乏综合名称实体字典、实体描述和大型带注释的训练语料库的资源贫乏语言。在本文中,我们解决了带有有限数量注释语料库的语言以及结构化资源(如阿拉伯语)的NED问题。我们提出了一种方法,利用结构化的英语资源来丰富语言不可知的NED系统的组成部分,并使其他语言的NED有效。我们通过融合来自多个多语言资源的数据和自动翻译/音译系统的输出来实现这一目标。我们通过合成阿拉伯语、西班牙语和意大利语的NED系统来展示我们方法的可行性和质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Named Entity Disambiguation for Resource-Poor Languages
Named entity disambiguation (NED) is the task of linking ambiguous names in natural language text to canonical entities like people, organizations or places, registered in a knowledge base. The problem is well-studied for English text, but few systems have considered resource-poor languages that lack comprehensive name-entity dictionaries, entity descriptions, and large annotated training corpora. In this paper we address the NED problem for languages with limited amount of annotated corpora as well as structured resource such as Arabic. We present a method that leverages structured English resources to enrich the components of a language-agnostic NED system and enable effective NED for other languages. We achieve this by fusing data from several multilingual resources and the output of automatic translation/transliteration systems. We show the viability and quality of our approach by synthesizing NED systems for Arabic, Spanish and Italian.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信