Question Generation in the Thai Language Using MT5

Nutthanit Wiwatbutsiri, A. Suchato, P. Punyabukkana, Nuengwong Tuaycharoen
{"title":"Question Generation in the Thai Language Using MT5","authors":"Nutthanit Wiwatbutsiri, A. Suchato, P. Punyabukkana, Nuengwong Tuaycharoen","doi":"10.1109/jcsse54890.2022.9836271","DOIUrl":null,"url":null,"abstract":"There are numerous publications of Question Generation (QG) in English but few in Thai. More than a million question-answer pairs are available in the English language, compared with only around 12,000 question-answer pairs in the Thai language. This paper presents a method to improve automatic Thai answer-agnostic QG from a dataset of insufficient size. Our evaluation showed that a QG model which was trained by the pre-trained model MT5 from a Thai dataset achieved a BLEU-1 score of 56.19. We proposed a method to generate synthetic data and an additional mechanism by using a single pre-trained model. Our best model outperformed the previous model by achieving a BLEU-1 score of 59.03. The results from the human evaluation in fluency score was 4.40, the relevance score 4.65, and the answer-ability score 4.7 out of 5.0.","PeriodicalId":284735,"journal":{"name":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/jcsse54890.2022.9836271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

There are numerous publications of Question Generation (QG) in English but few in Thai. More than a million question-answer pairs are available in the English language, compared with only around 12,000 question-answer pairs in the Thai language. This paper presents a method to improve automatic Thai answer-agnostic QG from a dataset of insufficient size. Our evaluation showed that a QG model which was trained by the pre-trained model MT5 from a Thai dataset achieved a BLEU-1 score of 56.19. We proposed a method to generate synthetic data and an additional mechanism by using a single pre-trained model. Our best model outperformed the previous model by achieving a BLEU-1 score of 59.03. The results from the human evaluation in fluency score was 4.40, the relevance score 4.65, and the answer-ability score 4.7 out of 5.0.
使用MT5的泰语问题生成
英语的问题生成(QG)出版物很多,但泰语的出版物很少。英语中有超过一百万个答案对,而泰语中只有大约12000个答案对。本文提出了一种改进自动泰语答案不可知QG的方法。我们的评估表明,由来自泰国数据集的预训练模型MT5训练的QG模型获得了56.19的BLEU-1分数。我们提出了一种通过使用单个预训练模型生成合成数据的方法和附加机制。我们的最佳模型优于之前的模型,获得了59.03分的BLEU-1分数。流利度评分为4.40分,相关性评分为4.65分,回答能力评分为4.7分(满分5.0分)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信