Sequence-driven Neural Network models for NER Tagging in Roman Urdu

Maaz Ali Nadeem, Khadija Irfan, Khaula Atiq, M. O. Beg, Muhammad Umair Arshad
{"title":"Sequence-driven Neural Network models for NER Tagging in Roman Urdu","authors":"Maaz Ali Nadeem, Khadija Irfan, Khaula Atiq, M. O. Beg, Muhammad Umair Arshad","doi":"10.1109/FIT57066.2022.00040","DOIUrl":null,"url":null,"abstract":"Modern Natural Language Processing research has taken a flight as it moves to address the issues of mapping contextual sequence labeling for low-resource languages. Named-Entity Recognition is one such labeling application; where text is considered contextually and labeled with the named entities. NER for Roman Urdu aims to achieve tasks such as Information Extraction, Machine Translation, and even big data operations on live digital content. There has been limited research on such NLP applications in Roman Urdu, however, work on Urdu and other languages of the family encourage active research. This paper holds comparisons using a few deep learning-based models that learn the importance of word classification by mapping to a specific context based on placement. Our model is trained on a hand-annotated corpus covering several domains. After a detailed comparison and evaluation, Bi-LSTM yields an exceptional F1-score of 82.7%. Our work demonstrates the possibility of long-range contextual understanding for processing morphologically rich low-resource languages.","PeriodicalId":102958,"journal":{"name":"2022 International Conference on Frontiers of Information Technology (FIT)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Frontiers of Information Technology (FIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FIT57066.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modern Natural Language Processing research has taken a flight as it moves to address the issues of mapping contextual sequence labeling for low-resource languages. Named-Entity Recognition is one such labeling application; where text is considered contextually and labeled with the named entities. NER for Roman Urdu aims to achieve tasks such as Information Extraction, Machine Translation, and even big data operations on live digital content. There has been limited research on such NLP applications in Roman Urdu, however, work on Urdu and other languages of the family encourage active research. This paper holds comparisons using a few deep learning-based models that learn the importance of word classification by mapping to a specific context based on placement. Our model is trained on a hand-annotated corpus covering several domains. After a detailed comparison and evaluation, Bi-LSTM yields an exceptional F1-score of 82.7%. Our work demonstrates the possibility of long-range contextual understanding for processing morphologically rich low-resource languages.
罗马乌尔都语NER标注的序列驱动神经网络模型
现代自然语言处理研究在解决低资源语言的映射上下文序列标记问题方面取得了长足的进步。命名实体识别就是这样一种标签应用;其中文本根据上下文进行考虑,并使用命名实体进行标记。罗马乌尔都语NER旨在实现实时数字内容的信息提取、机器翻译甚至大数据操作等任务。在罗马乌尔都语中对这种自然语言处理应用的研究有限,然而,乌尔都语和其他家庭语言的工作鼓励积极的研究。本文使用几个基于深度学习的模型进行比较,这些模型通过基于位置映射到特定上下文来学习单词分类的重要性。我们的模型是在覆盖多个领域的手工标注语料库上训练的。经过详细的比较和评价,Bi-LSTM的f1得分达到了82.7%。我们的工作证明了远程上下文理解处理形态学丰富的低资源语言的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信