{"title":"ViMRC VLSP 2021: XLM-R versus PhoBERT on Vietnamese Machine Reading Comprehension","authors":"Nhat Nguyen Duy, Phong Nguyen-Thuan Do","doi":"10.25073/2588-1086/vnucsce.334","DOIUrl":null,"url":null,"abstract":"The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VNU Journal of Science: Computer Science and Communication Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25073/2588-1086/vnucsce.334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when no answer is supported by the document and abstain from answering. In this paper, we present the description of our system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We propose a model to solve that task, called MRC4MRC. The model is a combination of two MRC components. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model is 79.13% of F1-score (F1) and 69.72% of EM (Exact Match) on the public-test set. Our experiments also show that the XLM-R language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.