z - code++:一种为抽象摘要优化的预训练语言模型

arXiv - CS - General Literature Pub Date : 2022-08-21 DOI:arxiv-2208.09770

Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang

{"title":"z - code++:一种为抽象摘要优化的预训练语言模型","authors":"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang","doi":"arxiv-2208.09770","DOIUrl":null,"url":null,"abstract":"This paper presents Z-Code++, a new pre-trained language model optimized for\nabstractive text summarization. The model extends the state of the art\nencoder-decoder model using three techniques. First, we use a two-phase\npre-training process to improve model's performance on low-resource\nsummarization tasks. The model is first pre-trained using text corpora for\nlanguage understanding, and then is continually pre-trained on summarization\ncorpora for grounded text generation. Second, we replace self-attention layers\nin the encoder with disentangled attention layers, where each word is\nrepresented using two vectors that encode its content and position,\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\nfew-shot settings, our model substantially outperforms the competing models.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization\",\"authors\":\"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang\",\"doi\":\"arxiv-2208.09770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents Z-Code++, a new pre-trained language model optimized for\\nabstractive text summarization. The model extends the state of the art\\nencoder-decoder model using three techniques. First, we use a two-phase\\npre-training process to improve model's performance on low-resource\\nsummarization tasks. The model is first pre-trained using text corpora for\\nlanguage understanding, and then is continually pre-trained on summarization\\ncorpora for grounded text generation. Second, we replace self-attention layers\\nin the encoder with disentangled attention layers, where each word is\\nrepresented using two vectors that encode its content and position,\\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\\nfew-shot settings, our model substantially outperforms the competing models.\",\"PeriodicalId\":501533,\"journal\":{\"name\":\"arXiv - CS - General Literature\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - General Literature\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2208.09770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2208.09770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的针对抽象文本摘要进行优化的预训练语言模型z - code++。该模型使用三种技术扩展了artencoder-decoder模型的状态。首先，我们使用两阶段预训练过程来提高模型在低资源汇总任务上的性能。该模型首先使用文本语料库进行语言理解的预训练，然后继续使用摘要语料库进行基础文本生成的预训练。其次，我们将编码器中的自注意层替换为解纠缠的注意层，其中每个单词分别使用两个向量来表示其内容和位置。第三，我们使用融合编码器，这是一种简单而有效的方法，以分层方式编码长序列。z - code++在5种语言的13个文本摘要任务中的9个上创造了新的技术水平。我们的模型是参数高效的，因为它在xsum上优于600倍大的PaLM-540B，在SAMSum上优于经过微调的200倍大的GPT3-175B。在零射击和少射击设置中，我们的模型实质上优于竞争模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - General Literature

自引率

0.00%

发文量