{"title":"Task-specific pre-training improves models for paraphrase generation","authors":"O. Skurzhanskyi, O. Marchenko","doi":"10.1145/3582768.3582791","DOIUrl":null,"url":null,"abstract":"Paraphrase generation is a fundamental and longstanding problem in the Natural Language Processing field. With the huge success of transfer learning, the pre-train → fine-tune approach has become a standard choice. At the same time, popular task-agnostic pre-trainings usually require gigabyte datasets and hundreds of GPUs, while available pre-trained models are limited by fixed architecture and size (i.e. base, large). We propose a simple and efficient pre-training approach specifically for paraphrase generation, which noticeably boosts model quality and matches the performance of general-purpose pre-trained models. We also investigate how this procedure influences the scores across different architectures and show that it works for all of them.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Paraphrase generation is a fundamental and longstanding problem in the Natural Language Processing field. With the huge success of transfer learning, the pre-train → fine-tune approach has become a standard choice. At the same time, popular task-agnostic pre-trainings usually require gigabyte datasets and hundreds of GPUs, while available pre-trained models are limited by fixed architecture and size (i.e. base, large). We propose a simple and efficient pre-training approach specifically for paraphrase generation, which noticeably boosts model quality and matches the performance of general-purpose pre-trained models. We also investigate how this procedure influences the scores across different architectures and show that it works for all of them.