{"title":"利用辅助语言和迁移学习增强低资源NER","authors":"Maithili Sabane, Aparna Ranade, Onkar Litake, Parth Patil, Raviraj Joshi, Dipali Kadam","doi":"10.1109/ICAAIC56838.2023.10141204","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot filling methods have been widely deployed for popular languages. NER is utilized in applications such as human assets, client benefit, substance classification, and the scholarly community. This research study focuses on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. This study uses various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. The, compares multilingual models with monolingual models and establish a baseline. The results show the assisting capabilities of the Hindi and Marathi languages for the NER task. Also, the results show that the models trained using multiple languages perform better than a single language. However, this research study also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.","PeriodicalId":267906,"journal":{"name":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Low Resource NER using Assisting Language and Transfer Learning\",\"authors\":\"Maithili Sabane, Aparna Ranade, Onkar Litake, Parth Patil, Raviraj Joshi, Dipali Kadam\",\"doi\":\"10.1109/ICAAIC56838.2023.10141204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot filling methods have been widely deployed for popular languages. NER is utilized in applications such as human assets, client benefit, substance classification, and the scholarly community. This research study focuses on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. This study uses various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. The, compares multilingual models with monolingual models and establish a baseline. The results show the assisting capabilities of the Hindi and Marathi languages for the NER task. Also, the results show that the models trained using multiple languages perform better than a single language. However, this research study also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.\",\"PeriodicalId\":267906,\"journal\":{\"name\":\"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)\",\"volume\":\"175 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAAIC56838.2023.10141204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAAIC56838.2023.10141204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing Low Resource NER using Assisting Language and Transfer Learning
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot filling methods have been widely deployed for popular languages. NER is utilized in applications such as human assets, client benefit, substance classification, and the scholarly community. This research study focuses on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. This study uses various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. The, compares multilingual models with monolingual models and establish a baseline. The results show the assisting capabilities of the Hindi and Marathi languages for the NER task. Also, the results show that the models trained using multiple languages perform better than a single language. However, this research study also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.