{"title":"ViMRC VLSP 2021: XLM-R与PhoBERT在越南语机器阅读理解上的对比","authors":"Nhat Nguyen Duy, Phong Nguyen-Thuan Do","doi":"10.25073/2588-1086/vnucsce.334","DOIUrl":null,"url":null,"abstract":"The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ViMRC VLSP 2021: XLM-R versus PhoBERT on Vietnamese Machine Reading Comprehension\",\"authors\":\"Nhat Nguyen Duy, Phong Nguyen-Thuan Do\",\"doi\":\"10.25073/2588-1086/vnucsce.334\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.\",\"PeriodicalId\":416488,\"journal\":{\"name\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"VNU Journal of Science: Computer Science and Communication Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25073/2588-1086/vnucsce.334\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ViMRC VLSP 2021: XLM-R versus PhoBERT on Vietnamese Machine Reading Comprehension
The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.