{"title":"Boosting Afaan Oromo Named Entity Recognition with Multiple Methods","authors":"Abdo Ababor Abafogi","doi":"10.5815/ijieeb.2021.05.05","DOIUrl":null,"url":null,"abstract":": Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more. The proposed approach integrates ML approaches based on a CRF algorithm, rule-based and pattern matching all together to improve the performance of Afaan Oromo NER during the learning and prediction process. It is capable of recognizing seven named entities location name, organization name, person name, currency, date time, percentage, and cardinal number. Voting and disambiguation are performed by comparing the classified candidates to select the most correct NEs type from the contemporary approaches and to penalize as well. The chunking combining two or more subsequent NEs having a similar entity category to be considered as a phrase and assigned a single tag. A wide-range experiment has been conducted on a token of around 40,000 on different features to achieve state-of-the-art performance [6]. To summarize the performance of AOroNER in F1-measure: Person, Organization, and Location name entity category are 83.9%, 83.9%, and 85.8% respectively. Likewise, the numeric and temporal expression such as Date/time, Currency, Percent, and Cardinal number is 88.5%, 85.5%, 88.5%, and 86% performed respectively. the voting of better result. I can conclude that rule base and pattern matching is performs well on numeric entities.","PeriodicalId":427770,"journal":{"name":"International Journal of Information Engineering and Electronic Business","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Engineering and Electronic Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijieeb.2021.05.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
: Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more. The proposed approach integrates ML approaches based on a CRF algorithm, rule-based and pattern matching all together to improve the performance of Afaan Oromo NER during the learning and prediction process. It is capable of recognizing seven named entities location name, organization name, person name, currency, date time, percentage, and cardinal number. Voting and disambiguation are performed by comparing the classified candidates to select the most correct NEs type from the contemporary approaches and to penalize as well. The chunking combining two or more subsequent NEs having a similar entity category to be considered as a phrase and assigned a single tag. A wide-range experiment has been conducted on a token of around 40,000 on different features to achieve state-of-the-art performance [6]. To summarize the performance of AOroNER in F1-measure: Person, Organization, and Location name entity category are 83.9%, 83.9%, and 85.8% respectively. Likewise, the numeric and temporal expression such as Date/time, Currency, Percent, and Cardinal number is 88.5%, 85.5%, 88.5%, and 86% performed respectively. the voting of better result. I can conclude that rule base and pattern matching is performs well on numeric entities.