More than Extracting "Important" Sentences: the Application of PEGASUS

Ting-Hao Yang, Ching-Ching Lu, Wen-Lian Hsu
{"title":"More than Extracting \"Important\" Sentences: the Application of PEGASUS","authors":"Ting-Hao Yang, Ching-Ching Lu, Wen-Lian Hsu","doi":"10.1109/taai54685.2021.00032","DOIUrl":null,"url":null,"abstract":"Pre-trained language models may reduce the amount of training data required. Among the models, PEGASUS, a recently proposed self-supervised approach, is trained to generate the pseudo-summary given the partially masked document. PEGASUS uses gap sentence generation for summarization. The most important sentences are masked, and then PEGASUS predicts the masked sentences as the output summary. In this study, however, we apply PEGASUS in a novel downstream task. We reformulate the task to generate the masked question part in a primary math word problem. In the past research, PEGASUS has shown good potentials on the few-shot datasets, so we try a smaller set of primary math text problems as well. The fine-tuning dataset sizes used in this study are 1000, 500, 50, 10 samples. Their performance are measured by a non-weighted average of the ROUGE-1, ROUGE-2, and ROUGE-L scores. The results show the outstanding performance of PEGASUS applied in our novel downstream task.","PeriodicalId":343821,"journal":{"name":"2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/taai54685.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Pre-trained language models may reduce the amount of training data required. Among the models, PEGASUS, a recently proposed self-supervised approach, is trained to generate the pseudo-summary given the partially masked document. PEGASUS uses gap sentence generation for summarization. The most important sentences are masked, and then PEGASUS predicts the masked sentences as the output summary. In this study, however, we apply PEGASUS in a novel downstream task. We reformulate the task to generate the masked question part in a primary math word problem. In the past research, PEGASUS has shown good potentials on the few-shot datasets, so we try a smaller set of primary math text problems as well. The fine-tuning dataset sizes used in this study are 1000, 500, 50, 10 samples. Their performance are measured by a non-weighted average of the ROUGE-1, ROUGE-2, and ROUGE-L scores. The results show the outstanding performance of PEGASUS applied in our novel downstream task.
不仅仅是提取“重要”句子:PEGASUS的应用
预训练的语言模型可以减少所需的训练数据量。在这些模型中,PEGASUS是最近提出的一种自监督方法,它被训练来生成给定部分屏蔽文档的伪摘要。PEGASUS使用间隔句生成进行摘要。对最重要的句子进行屏蔽,然后PEGASUS预测这些被屏蔽的句子作为输出摘要。然而,在本研究中,我们将PEGASUS应用于一个新的下游任务。我们将该任务重新定义为生成初级数学单词问题中的掩蔽问题部分。在过去的研究中,PEGASUS在少数数据集上显示出良好的潜力,因此我们也尝试了一组较小的初级数学文本问题。本研究中使用的微调数据集大小分别为1000、500、50、10个样本。他们的表现是通过ROUGE-1, ROUGE-2和ROUGE-L分数的非加权平均值来衡量的。结果表明,PEGASUS在新型下游任务中的应用具有优异的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信