Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang
{"title":"z - code++:一种为抽象摘要优化的预训练语言模型","authors":"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang","doi":"arxiv-2208.09770","DOIUrl":null,"url":null,"abstract":"This paper presents Z-Code++, a new pre-trained language model optimized for\nabstractive text summarization. The model extends the state of the art\nencoder-decoder model using three techniques. First, we use a two-phase\npre-training process to improve model's performance on low-resource\nsummarization tasks. The model is first pre-trained using text corpora for\nlanguage understanding, and then is continually pre-trained on summarization\ncorpora for grounded text generation. Second, we replace self-attention layers\nin the encoder with disentangled attention layers, where each word is\nrepresented using two vectors that encode its content and position,\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\nfew-shot settings, our model substantially outperforms the competing models.","PeriodicalId":501533,"journal":{"name":"arXiv - CS - General Literature","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization\",\"authors\":\"Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang\",\"doi\":\"arxiv-2208.09770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents Z-Code++, a new pre-trained language model optimized for\\nabstractive text summarization. The model extends the state of the art\\nencoder-decoder model using three techniques. First, we use a two-phase\\npre-training process to improve model's performance on low-resource\\nsummarization tasks. The model is first pre-trained using text corpora for\\nlanguage understanding, and then is continually pre-trained on summarization\\ncorpora for grounded text generation. Second, we replace self-attention layers\\nin the encoder with disentangled attention layers, where each word is\\nrepresented using two vectors that encode its content and position,\\nrespectively. Third, we use fusion-in-encoder, a simple yet effective method of\\nencoding long sequences in a hierarchical manner. Z-Code++ creates new state of\\nthe art on 9 out of 13 text summarization tasks across 5 languages. Our model\\nis parameter-efficient in that it outperforms the 600x larger PaLM-540B on\\nXSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and\\nfew-shot settings, our model substantially outperforms the competing models.\",\"PeriodicalId\":501533,\"journal\":{\"name\":\"arXiv - CS - General Literature\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - General Literature\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2208.09770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - General Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2208.09770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
This paper presents Z-Code++, a new pre-trained language model optimized for
abstractive text summarization. The model extends the state of the art
encoder-decoder model using three techniques. First, we use a two-phase
pre-training process to improve model's performance on low-resource
summarization tasks. The model is first pre-trained using text corpora for
language understanding, and then is continually pre-trained on summarization
corpora for grounded text generation. Second, we replace self-attention layers
in the encoder with disentangled attention layers, where each word is
represented using two vectors that encode its content and position,
respectively. Third, we use fusion-in-encoder, a simple yet effective method of
encoding long sequences in a hierarchical manner. Z-Code++ creates new state of
the art on 9 out of 13 text summarization tasks across 5 languages. Our model
is parameter-efficient in that it outperforms the 600x larger PaLM-540B on
XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and
few-shot settings, our model substantially outperforms the competing models.