标题实体识别工具在新闻文章中的应用比较

S. Vychegzhanin, E. Kotelnikov
{"title":"标题实体识别工具在新闻文章中的应用比较","authors":"S. Vychegzhanin, E. Kotelnikov","doi":"10.1109/ISPRAS47671.2019.00017","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition in texts is an important natural language processing task. There are many systems to solve this problem. These systems differ in targeting domains, processing methodologies, supported languages and recognized entity types. The presence of a large number of aspects creates difficulties for the user when choosing the appropriate tool for solving a specific problem. The aim of this work is a comparative study of seven publicly available and well-known libraries that can elicit named entities: Stanford NER, spaCy, NLTK, Polyglot, Flair, GATE and DeepPavlov. The article consists of seven sections. The introduction lists the areas of application for the Named Entity Recognition task and the approaches used to solve it. The second section is devoted to a review of works in which comparative studies of existing tools are presented. In the third section, the characteristics of the four text corpora that were used during the experiments are given. The fourth section contains a brief description of the tools selected for research. The fifth section describes the metrics used to evaluate tool performance. The sixth section presents the results of the experiments and their discussion. In conclusion the results of the work are summarized. The results of the study show that for the English language close values of the F1-score for the problem of Named Entities Recognition have the Flair and DeepPavlov libraries. For the Russian language the first place is taken by the DeepPavlov library, significantly surpassing other tools in quality.","PeriodicalId":154688,"journal":{"name":"2019 Ivannikov Ispras Open Conference (ISPRAS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Comparison of Named Entity Recognition Tools Applied to News Articles\",\"authors\":\"S. Vychegzhanin, E. Kotelnikov\",\"doi\":\"10.1109/ISPRAS47671.2019.00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named Entity Recognition in texts is an important natural language processing task. There are many systems to solve this problem. These systems differ in targeting domains, processing methodologies, supported languages and recognized entity types. The presence of a large number of aspects creates difficulties for the user when choosing the appropriate tool for solving a specific problem. The aim of this work is a comparative study of seven publicly available and well-known libraries that can elicit named entities: Stanford NER, spaCy, NLTK, Polyglot, Flair, GATE and DeepPavlov. The article consists of seven sections. The introduction lists the areas of application for the Named Entity Recognition task and the approaches used to solve it. The second section is devoted to a review of works in which comparative studies of existing tools are presented. In the third section, the characteristics of the four text corpora that were used during the experiments are given. The fourth section contains a brief description of the tools selected for research. The fifth section describes the metrics used to evaluate tool performance. The sixth section presents the results of the experiments and their discussion. In conclusion the results of the work are summarized. The results of the study show that for the English language close values of the F1-score for the problem of Named Entities Recognition have the Flair and DeepPavlov libraries. For the Russian language the first place is taken by the DeepPavlov library, significantly surpassing other tools in quality.\",\"PeriodicalId\":154688,\"journal\":{\"name\":\"2019 Ivannikov Ispras Open Conference (ISPRAS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Ivannikov Ispras Open Conference (ISPRAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPRAS47671.2019.00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Ispras Open Conference (ISPRAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPRAS47671.2019.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

文本命名实体识别是一项重要的自然语言处理任务。有很多系统可以解决这个问题。这些系统在目标领域、处理方法、支持的语言和可识别的实体类型方面有所不同。大量方面的存在给用户在选择解决特定问题的适当工具时带来了困难。这项工作的目的是对七个公开可用的知名库进行比较研究,这些库可以引出命名实体:Stanford NER, space, NLTK, Polyglot, Flair, GATE和DeepPavlov。这篇文章由七个部分组成。引言部分列出了命名实体识别任务的应用领域以及用于解决该任务的方法。第二部分致力于对现有工具进行比较研究的作品进行回顾。第三部分给出了实验中使用的四种文本语料库的特征。第四部分简要介绍了所选择的研究工具。第五部分描述了用于评估工具性能的度量。第六部分给出了实验结果和讨论。最后对本文的研究结果进行了总结。研究结果表明,对于命名实体识别问题的英语语言f1分数的接近值具有Flair和DeepPavlov库。对于俄语来说,DeepPavlov库占据了第一位,在质量上大大超过了其他工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of Named Entity Recognition Tools Applied to News Articles
Named Entity Recognition in texts is an important natural language processing task. There are many systems to solve this problem. These systems differ in targeting domains, processing methodologies, supported languages and recognized entity types. The presence of a large number of aspects creates difficulties for the user when choosing the appropriate tool for solving a specific problem. The aim of this work is a comparative study of seven publicly available and well-known libraries that can elicit named entities: Stanford NER, spaCy, NLTK, Polyglot, Flair, GATE and DeepPavlov. The article consists of seven sections. The introduction lists the areas of application for the Named Entity Recognition task and the approaches used to solve it. The second section is devoted to a review of works in which comparative studies of existing tools are presented. In the third section, the characteristics of the four text corpora that were used during the experiments are given. The fourth section contains a brief description of the tools selected for research. The fifth section describes the metrics used to evaluate tool performance. The sixth section presents the results of the experiments and their discussion. In conclusion the results of the work are summarized. The results of the study show that for the English language close values of the F1-score for the problem of Named Entities Recognition have the Flair and DeepPavlov libraries. For the Russian language the first place is taken by the DeepPavlov library, significantly surpassing other tools in quality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信