{"title":"阿萨姆语命名实体识别使用两个独立的模型:BiLSTM和BERT","authors":"Plabita Baruah, Bandana Dutta, Shikhar Kumar Sarma, Kuwali Talukdar","doi":"10.1016/j.procs.2025.04.262","DOIUrl":null,"url":null,"abstract":"<div><div>Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"258 ","pages":"Pages 242-251"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Named Entity Recognition in Assamese Language using two separate models: BiLSTM and BERT\",\"authors\":\"Plabita Baruah, Bandana Dutta, Shikhar Kumar Sarma, Kuwali Talukdar\",\"doi\":\"10.1016/j.procs.2025.04.262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.</div></div>\",\"PeriodicalId\":20465,\"journal\":{\"name\":\"Procedia Computer Science\",\"volume\":\"258 \",\"pages\":\"Pages 242-251\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Procedia Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S187705092501364X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187705092501364X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Named Entity Recognition in Assamese Language using two separate models: BiLSTM and BERT
Named Entity Recognition (NER) is a tool based on principles of Artificial Intelligence (AI) and Natural Language Processing (NLP) for automatically tagging Named Entities from unstructured text. In the realm of Natural Language Processing (NLP) applications, Named Entity Recognition (NER) holds significance as it involves the crucial task of identifying and categorizing proper nouns into classes such as person, location, organization, and miscellaneous. While considerable progress has been made in widely spoken languages like English and other European languages, resulting in higher accuracy rates, the task of NER in Indian languages prove to be challenging due to limited resources. This study explores the implementation of NER in Assamese using two separate approaches: BiLSTM and BERT. The proposed methodology achieves an accuracy of 31%in the BiLSTM model. While using BERT, which is a pretrained model, fine-tuned for Assamese, we achieved a precision of 81.5% and F1- score of 0.383. Our comparative analysis shows that both models are effective for NER in a resource-scarce language like Assamese, but BERT performs better overall in recognizing entities. This suggests that BERT could play a key role in improving NER techniques for underrepresented languages.