z - code++:一种为抽象摘要优化的预训练语言模型

Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang
{"title":"z - code++:一种为抽象摘要优化的预训练语言模型","authors":"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang","doi":"arxiv-2208.09770","DOIUrl":null,"url":null,"abstract":"This paper presents Z-Code++, a new pre-trained language model optimized for\nabstractive text summarization. The model extends the state of the art\nencoder-decoder model using three techniques. First, we use a two-phase\npre-training process to improve model's performance on low-resource\nsummarization tasks. The model is first pre-trained using text corpora for\nlanguage understanding, and then is continually pre-trained on summarization\ncorpora for grounded text generation. Second, we replace self-attention layers\nin the encoder with disentangled attention layers, where each word is\nrepresented using two vectors that encode its content and position,\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\nfew-shot settings, our model substantially outperforms the competing models.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization\",\"authors\":\"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang\",\"doi\":\"arxiv-2208.09770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents Z-Code++, a new pre-trained language model optimized for\\nabstractive text summarization. The model extends the state of the art\\nencoder-decoder model using three techniques. First, we use a two-phase\\npre-training process to improve model's performance on low-resource\\nsummarization tasks. The model is first pre-trained using text corpora for\\nlanguage understanding, and then is continually pre-trained on summarization\\ncorpora for grounded text generation. Second, we replace self-attention layers\\nin the encoder with disentangled attention layers, where each word is\\nrepresented using two vectors that encode its content and position,\\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\\nfew-shot settings, our model substantially outperforms the competing models.\",\"PeriodicalId\":501533,\"journal\":{\"name\":\"arXiv - CS - General Literature\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - General Literature\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2208.09770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2208.09770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种新的针对抽象文本摘要进行优化的预训练语言模型z - code++。该模型使用三种技术扩展了artencoder-decoder模型的状态。首先,我们使用两阶段预训练过程来提高模型在低资源汇总任务上的性能。该模型首先使用文本语料库进行语言理解的预训练,然后继续使用摘要语料库进行基础文本生成的预训练。其次,我们将编码器中的自注意层替换为解纠缠的注意层,其中每个单词分别使用两个向量来表示其内容和位置。第三,我们使用融合编码器,这是一种简单而有效的方法,以分层方式编码长序列。z - code++在5种语言的13个文本摘要任务中的9个上创造了新的技术水平。我们的模型是参数高效的,因为它在xsum上优于600倍大的PaLM-540B,在SAMSum上优于经过微调的200倍大的GPT3-175B。在零射击和少射击设置中,我们的模型实质上优于竞争模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信