Ngan Nguyen Luu Thuy, Đặng Văn Thìn, Hoàng Xuân Vũ, Nguyễn Văn Tài, Khoa Thi-Kim Phan
{"title":"vnli - VLSP 2021:基于预训练多语言模型的越南语和英越语文本蕴涵","authors":"Ngan Nguyen Luu Thuy, Đặng Văn Thìn, Hoàng Xuân Vũ, Nguyễn Văn Tài, Khoa Thi-Kim Phan","doi":"10.25073/2588-1086/vnucsce.329","DOIUrl":null,"url":null,"abstract":"Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"vnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained Multilingual Language Models\",\"authors\":\"Ngan Nguyen Luu Thuy, Đặng Văn Thìn, Hoàng Xuân Vũ, Nguyễn Văn Tài, Khoa Thi-Kim Phan\",\"doi\":\"10.25073/2588-1086/vnucsce.329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\\\\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\\\\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.\",\"PeriodicalId\":416488,\"journal\":{\"name\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25073/2588-1086/vnucsce.329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
vnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained Multilingual Language Models
Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.