{"title":"Masked transformer through knowledge distillation for unsupervised text style transfer","authors":"Arthur Scalercio, A. Paes","doi":"10.1017/s1351324923000323","DOIUrl":null,"url":null,"abstract":"\n Text style transfer (TST) aims at automatically changing a text’s stylistic features, such as formality, sentiment, authorial style, humor, and complexity, while still trying to preserve its content. Although the scientific community has investigated TST since the 1980s, it has recently regained attention by adopting deep unsupervised strategies to address the challenge of training without parallel data. In this manuscript, we investigate how relying on sequence-to-sequence pretraining models affects the performance of TST when the pretraining step leverages pairs of paraphrase data. Furthermore, we propose a new technique to enhance the sequence-to-sequence model by distilling knowledge from masked language models. We evaluate our proposals on three unsupervised style transfer tasks with widely used benchmarks: author imitation, formality transfer, and polarity swap. The evaluation relies on quantitative and qualitative analyses and comparisons with the results of state-of-the-art models. For the author imitation and the formality transfer task, we show that using the proposed techniques improves all measured metrics and leads to state-of-the-art (SOTA) results in content preservation and an overall score in the author imitation domain. In the formality transfer domain, we paired with the SOTA method in the style control metric. Regarding the polarity swap domain, we show that the knowledge distillation component improves all measured metrics. The paraphrase pretraining increases content preservation at the expense of harming style control. Based on the results reached in these domains, we also discuss in the manuscript if the tasks we address have the same nature and should be equally treated as TST tasks.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s1351324923000323","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text style transfer (TST) aims at automatically changing a text’s stylistic features, such as formality, sentiment, authorial style, humor, and complexity, while still trying to preserve its content. Although the scientific community has investigated TST since the 1980s, it has recently regained attention by adopting deep unsupervised strategies to address the challenge of training without parallel data. In this manuscript, we investigate how relying on sequence-to-sequence pretraining models affects the performance of TST when the pretraining step leverages pairs of paraphrase data. Furthermore, we propose a new technique to enhance the sequence-to-sequence model by distilling knowledge from masked language models. We evaluate our proposals on three unsupervised style transfer tasks with widely used benchmarks: author imitation, formality transfer, and polarity swap. The evaluation relies on quantitative and qualitative analyses and comparisons with the results of state-of-the-art models. For the author imitation and the formality transfer task, we show that using the proposed techniques improves all measured metrics and leads to state-of-the-art (SOTA) results in content preservation and an overall score in the author imitation domain. In the formality transfer domain, we paired with the SOTA method in the style control metric. Regarding the polarity swap domain, we show that the knowledge distillation component improves all measured metrics. The paraphrase pretraining increases content preservation at the expense of harming style control. Based on the results reached in these domains, we also discuss in the manuscript if the tasks we address have the same nature and should be equally treated as TST tasks.
期刊介绍:
Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.