阿萨姆语命名实体识别使用两个独立的模型：BiLSTM和BERT

Procedia Computer Science Pub Date : 2025-01-01 DOI:10.1016/j.procs.2025.04.262

Plabita Baruah, Bandana Dutta, Shikhar Kumar Sarma, Kuwali Talukdar

{"title":"阿萨姆语命名实体识别使用两个独立的模型：BiLSTM和BERT","authors":"Plabita Baruah, Bandana Dutta, Shikhar Kumar Sarma, Kuwali Talukdar","doi":"10.1016/j.procs.2025.04.262","DOIUrl":null,"url":null,"abstract":"<div><div>Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"258 ","pages":"Pages 242-251"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Named Entity Recognition in Assamese Language using two separate models: BiLSTM and BERT\",\"authors\":\"Plabita Baruah, Bandana Dutta, Shikhar Kumar Sarma, Kuwali Talukdar\",\"doi\":\"10.1016/j.procs.2025.04.262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.</div></div>\",\"PeriodicalId\":20465,\"journal\":{\"name\":\"Procedia Computer Science\",\"volume\":\"258 \",\"pages\":\"Pages 242-251\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Procedia Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S187705092501364X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187705092501364X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

命名实体识别（NER）是一种基于人工智能（AI）和自然语言处理（NLP）原理的工具，用于从非结构化文本中自动标记命名实体。在自然语言处理（NLP）应用领域，命名实体识别（NER）具有重要意义，因为它涉及识别和分类专有名词的关键任务，如人员、位置、组织和其他类。虽然在英语和其他欧洲语言等广泛使用的语言中取得了相当大的进步，从而提高了准确率，但由于资源有限，印度语言的NER任务被证明是具有挑战性的。本研究使用两种不同的方法：BiLSTM和BERT探讨了在阿萨姆邦实施NER。该方法在BiLSTM模型中达到了31%的准确率。而使用BERT，这是一个针对阿萨姆邦进行微调的预训练模型，我们实现了81.5%的精度和0.383的F1-分数。我们的比较分析表明，这两种模型对于资源稀缺语言（如阿萨姆语）中的NER都是有效的，但BERT在识别实体方面的总体表现更好。这表明BERT可以在改进未被充分代表的语言的NER技术方面发挥关键作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Named Entity Recognition in Assamese Language using two separate models: BiLSTM and BERT

Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Procedia Computer Science

CiteScore

4.50

自引率

0.00%

发文量