Integrating Bilingual Named Entities Lexicon with Conditional Random Fields Model for Arabic Named Entities Recognition

Emna Hkiri, Souheyl Mallat, M. Zrigui
{"title":"Integrating Bilingual Named Entities Lexicon with Conditional Random Fields Model for Arabic Named Entities Recognition","authors":"Emna Hkiri, Souheyl Mallat, M. Zrigui","doi":"10.1109/ICDAR.2017.105","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition plays an important role in locating and classifying atomic elements into predefined categories such as person names, locations, organizations, expression of times, temporal expressions etc. Several approaches with rule based and machine learning based techniques have been applied on English and some other Latin languages successfully. Arabic has a complex and rich morphology, which makes the named entities recognition a challenging process. In this paper we propose our hybrid NER system that applies conditional random fields (CRF), bilingual NE lexicon and grammar rules to the task of Named Entity Recognition in Arabic languages. The aim of our system is enhancing the overall performance of NER tasks. The empirical results indicate that the hybrid system outperforms the state-of-the-art of Arabic NER in terms of precision when applied to ANERcorp dataset, with f-measures 83.36 for Person, 89.58for Location, and 72.26 for Organization","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2017.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Named Entity Recognition plays an important role in locating and classifying atomic elements into predefined categories such as person names, locations, organizations, expression of times, temporal expressions etc. Several approaches with rule based and machine learning based techniques have been applied on English and some other Latin languages successfully. Arabic has a complex and rich morphology, which makes the named entities recognition a challenging process. In this paper we propose our hybrid NER system that applies conditional random fields (CRF), bilingual NE lexicon and grammar rules to the task of Named Entity Recognition in Arabic languages. The aim of our system is enhancing the overall performance of NER tasks. The empirical results indicate that the hybrid system outperforms the state-of-the-art of Arabic NER in terms of precision when applied to ANERcorp dataset, with f-measures 83.36 for Person, 89.58for Location, and 72.26 for Organization
集成双语命名实体词典和条件随机场模型的阿拉伯语命名实体识别
命名实体识别在将原子元素定位和分类到预定义的类别(如人名、地点、组织、时间表达式、时间表达式等)方面起着重要作用。基于规则和机器学习技术的几种方法已经成功地应用于英语和其他一些拉丁语言。阿拉伯语具有复杂而丰富的形态,这使得命名实体的识别成为一个具有挑战性的过程。本文提出了一种将条件随机场(CRF)、双语网元词汇和语法规则应用于阿拉伯语命名实体识别任务的混合网元系统。我们系统的目标是提高NER任务的整体性能。实证结果表明,当应用于ANERcorp数据集时,混合系统在精度方面优于最先进的阿拉伯NER, Person的f值为83.36,Location的f值为89.58,Organization的f值为72.26
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信