多方法增强阿法安奥罗莫命名实体识别

Abdo Ababor Abafogi
{"title":"多方法增强阿法安奥罗莫命名实体识别","authors":"Abdo Ababor Abafogi","doi":"10.5815/ijieeb.2021.05.05","DOIUrl":null,"url":null,"abstract":": Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more. The proposed approach integrates ML approaches based on a CRF algorithm, rule-based and pattern matching all together to improve the performance of Afaan Oromo NER during the learning and prediction process. It is capable of recognizing seven named entities location name, organization name, person name, currency, date time, percentage, and cardinal number. Voting and disambiguation are performed by comparing the classified candidates to select the most correct NEs type from the contemporary approaches and to penalize as well. The chunking combining two or more subsequent NEs having a similar entity category to be considered as a phrase and assigned a single tag. A wide-range experiment has been conducted on a token of around 40,000 on different features to achieve state-of-the-art performance [6]. To summarize the performance of AOroNER in F1-measure: Person, Organization, and Location name entity category are 83.9%, 83.9%, and 85.8% respectively. Likewise, the numeric and temporal expression such as Date/time, Currency, Percent, and Cardinal number is 88.5%, 85.5%, 88.5%, and 86% performed respectively. the voting of better result. I can conclude that rule base and pattern matching is performs well on numeric entities.","PeriodicalId":427770,"journal":{"name":"International Journal of Information Engineering and Electronic Business","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Boosting Afaan Oromo Named Entity Recognition with Multiple Methods\",\"authors\":\"Abdo Ababor Abafogi\",\"doi\":\"10.5815/ijieeb.2021.05.05\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more. The proposed approach integrates ML approaches based on a CRF algorithm, rule-based and pattern matching all together to improve the performance of Afaan Oromo NER during the learning and prediction process. It is capable of recognizing seven named entities location name, organization name, person name, currency, date time, percentage, and cardinal number. Voting and disambiguation are performed by comparing the classified candidates to select the most correct NEs type from the contemporary approaches and to penalize as well. The chunking combining two or more subsequent NEs having a similar entity category to be considered as a phrase and assigned a single tag. A wide-range experiment has been conducted on a token of around 40,000 on different features to achieve state-of-the-art performance [6]. To summarize the performance of AOroNER in F1-measure: Person, Organization, and Location name entity category are 83.9%, 83.9%, and 85.8% respectively. Likewise, the numeric and temporal expression such as Date/time, Currency, Percent, and Cardinal number is 88.5%, 85.5%, 88.5%, and 86% performed respectively. the voting of better result. I can conclude that rule base and pattern matching is performs well on numeric entities.\",\"PeriodicalId\":427770,\"journal\":{\"name\":\"International Journal of Information Engineering and Electronic Business\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Engineering and Electronic Business\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijieeb.2021.05.05\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Engineering and Electronic Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijieeb.2021.05.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

命名实体识别器(NER)是一种在自然语言处理(NLP)和信息检索(IR)中广泛使用的信息提取(IE)方法,旨在预测和分类给定文本中的单词,并将其分类为预定义的命名实体类,如人,日期/时间,组织,位置等。本文采用多种方法对Afaan Oromo进行增强NER。机器学习、存储规则和模式匹配等方法的组合使系统更有效、更准确地识别候选名称实体(NEs)。它从每种方法中获得最强的点来提高系统性能,通过投票选出在一个以上实体类别中检测到的候选NE或由于单词歧义而脱离上下文,它通过词义消歧来惩罚。在最终输出之前,带有相同标签的后续网元合并为单个标签。评价结果表明,该系统具有较好的性能。最后,提出了基于规则和无监督的零资源跨语言混合方法的未来发展方向。该方法将基于CRF算法的ML方法、基于规则的ML方法和模式匹配方法集成在一起,以提高Afaan Oromo NER在学习和预测过程中的性能。它能够识别七种命名实体位置名称、组织名称、人名、货币、日期时间、百分比和基数。投票和消歧义是通过比较分类的候选人来进行的,以便从当代方法中选择最正确的ne类型并进行处罚。将两个或多个具有相似实体类别的后续网元组合在一起,将其视为短语并分配单个标签。为了达到最先进的性能,在大约40,000代币上进行了大范围的实验[6]。综上所述,在f1测度中,个人、组织和地点名称实体类别的表现分别为83.9%、83.9%和85.8%。同样,日期/时间、货币、百分比和基数等数字和时间表达式分别为88.5%、85.5%、88.5%和86%。结果更好的投票。我可以得出结论,规则库和模式匹配在数字实体上表现良好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Boosting Afaan Oromo Named Entity Recognition with Multiple Methods
: Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more. The proposed approach integrates ML approaches based on a CRF algorithm, rule-based and pattern matching all together to improve the performance of Afaan Oromo NER during the learning and prediction process. It is capable of recognizing seven named entities location name, organization name, person name, currency, date time, percentage, and cardinal number. Voting and disambiguation are performed by comparing the classified candidates to select the most correct NEs type from the contemporary approaches and to penalize as well. The chunking combining two or more subsequent NEs having a similar entity category to be considered as a phrase and assigned a single tag. A wide-range experiment has been conducted on a token of around 40,000 on different features to achieve state-of-the-art performance [6]. To summarize the performance of AOroNER in F1-measure: Person, Organization, and Location name entity category are 83.9%, 83.9%, and 85.8% respectively. Likewise, the numeric and temporal expression such as Date/time, Currency, Percent, and Cardinal number is 88.5%, 85.5%, 88.5%, and 86% performed respectively. the voting of better result. I can conclude that rule base and pattern matching is performs well on numeric entities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信