A comparison of named entity recognition tools applied to biographical texts

Samet Atdag, Vincent Labatut
{"title":"A comparison of named entity recognition tools applied to biographical texts","authors":"Samet Atdag, Vincent Labatut","doi":"10.1109/IcConSCS.2013.6632052","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) is a popular domain of natural language processing. For this reason, many tools exist to perform this task. Amongst other points, they differ in the processing method they rely upon, the entity types they can detect, the nature of the text they can handle, and their input/output formats. This makes it difficult for a user to select an appropriate NER tool for a specific situation. In this article, we try to answer this question in the context of biographic texts. For this matter, we first constitute a new corpus by annotating 247 Wikipedia articles. We then select 4 publicly available, well known and free for research NER tools for comparison: Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe. We apply them to our corpus, assess their performances and compare them. When considering overall performances, a clear hierarchy emerges: Stanford has the best results, followed by LingPipe, Illionois and OpenCalais. However, a more detailed evaluation performed relatively to entity types and article categories highlights the fact their performances are diversely influenced by those factors. This complementarity opens an interesting perspective regarding the combination of these individual tools in order to improve performance.","PeriodicalId":265358,"journal":{"name":"2nd International Conference on Systems and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2nd International Conference on Systems and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IcConSCS.2013.6632052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59

Abstract

Named entity recognition (NER) is a popular domain of natural language processing. For this reason, many tools exist to perform this task. Amongst other points, they differ in the processing method they rely upon, the entity types they can detect, the nature of the text they can handle, and their input/output formats. This makes it difficult for a user to select an appropriate NER tool for a specific situation. In this article, we try to answer this question in the context of biographic texts. For this matter, we first constitute a new corpus by annotating 247 Wikipedia articles. We then select 4 publicly available, well known and free for research NER tools for comparison: Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe. We apply them to our corpus, assess their performances and compare them. When considering overall performances, a clear hierarchy emerges: Stanford has the best results, followed by LingPipe, Illionois and OpenCalais. However, a more detailed evaluation performed relatively to entity types and article categories highlights the fact their performances are diversely influenced by those factors. This complementarity opens an interesting perspective regarding the combination of these individual tools in order to improve performance.
应用于传记文本的命名实体识别工具的比较
命名实体识别(NER)是自然语言处理的一个热门领域。由于这个原因,存在许多工具来执行此任务。在其他方面,它们的不同之处在于所依赖的处理方法、可以检测的实体类型、可以处理的文本的性质以及输入/输出格式。这使得用户难以为特定情况选择合适的NER工具。在本文中,我们试图在传记文本的背景下回答这个问题。为此,我们首先通过注释247篇维基百科文章来构建一个新的语料库。然后,我们选择了4个公开可用的,众所周知的和免费的研究NER工具进行比较:Stanford NER, Illinois NET, OpenCalais NER WS和Alias-i LingPipe。我们将它们应用到我们的语料库中,评估它们的表现并进行比较。在考虑整体表现时,一个清晰的等级出现了:斯坦福大学的成绩最好,其次是LingPipe、伊利诺伊大学和OpenCalais。但是,对实体类型和文章类别进行的更详细的评估突出表明,它们的性能受到这些因素的不同影响。这种互补性为这些单独工具的组合提供了一个有趣的视角,以提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信