Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy

Eleni Partalidou, Eleftherios Spyromitros Xioufis, S. Doropoulos, S. Vologiannidis, K. Diamantaras
{"title":"Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy","authors":"Eleni Partalidou, Eleftherios Spyromitros Xioufis, S. Doropoulos, S. Vologiannidis, K. Diamantaras","doi":"10.1145/3350546.3352543","DOIUrl":null,"url":null,"abstract":"This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed. CCS CONCEPTS • Computing methodologies → Natural language processing.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed. CCS CONCEPTS • Computing methodologies → Natural language processing.
使用spaCy的开源希腊语POS标记器和实体识别器的设计与实现
本文提出了一种希腊语词性标注和命名实体识别的机器学习方法,重点是提取形态学特征和将标记分类为命名实体的一小组类。介绍了所采用的体系结构模型。spaCy平台的希腊语版本被添加到源代码中,这是一个在我们贡献之前不存在的特性,用于构建模型。此外,还训练了一个词性标注器,该标注器可以检测标记的形态,并且在仅对词性进行分类时,其性能优于最先进的结果。对于使用spaCy的命名实体识别,构建了一个扩展标准ENAMEX类型(组织、位置、人员)的模型。某些实验表明,在词汇表外的单词中需要灵活性,并且正在努力解决这个问题。最后,对评价结果进行了讨论。•计算方法→自然语言处理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信