面向无监督文本风格转移的鲁棒和语义组织潜在表示

Sharan Narasimhan, Suvodip Dey, M. Desarkar
{"title":"面向无监督文本风格转移的鲁棒和语义组织潜在表示","authors":"Sharan Narasimhan, Suvodip Dey, M. Desarkar","doi":"10.48550/arXiv.2205.02309","DOIUrl":null,"url":null,"abstract":"Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is “coarse-grained” i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map “similar” sentences (“similar” defined by low Levenshtein distance/ high word overlap) close by in latent space. This definition of “similarity” does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than its counterparts, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer\",\"authors\":\"Sharan Narasimhan, Suvodip Dey, M. Desarkar\",\"doi\":\"10.48550/arXiv.2205.02309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is “coarse-grained” i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map “similar” sentences (“similar” defined by low Levenshtein distance/ high word overlap) close by in latent space. This definition of “similarity” does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than its counterparts, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.02309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.02309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

最近的研究表明,基于自动编码器的方法成功地使用未标记的数据集以零采样的方式在未见过的属性上执行语言生成、平滑句子插值和风格迁移。这些模型的潜在空间几何结构组织得足够好,可以在风格是“粗粒度”的数据集上执行,例如,一个句子中仅一小部分单词就足以确定整体风格标签。最近的一项研究使用基于离散标记的扰动方法来映射潜在空间中邻近的“相似”句子(“相似”由低Levenshtein距离/高单词重叠定义)。这种“相似性”的定义在映射潜在空间邻域时没有研究组成词的潜在细微差别,因此在映射潜在邻域时无法识别具有不同风格语义的句子。我们引入了嵌入摄动对抗自编码器(EPAAEs),通过在连续嵌入空间上添加精细可调的噪声分量来完成这个摄动模型。我们的经验表明,这(a)产生了一个更好的组织潜在空间,将风格相似的句子聚集在一起,(b)在不同的文本风格迁移任务中表现最好,并且(c)能够对风格迁移强度进行细粒度控制。我们还将文本样式迁移任务扩展到NLI数据集,并表明EPAAE可以最好地学习这些更复杂的样式定义。据我们所知,将风格迁移扩展到NLI任务之前还没有被探索过。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer
Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is “coarse-grained” i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map “similar” sentences (“similar” defined by low Levenshtein distance/ high word overlap) close by in latent space. This definition of “similarity” does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than its counterparts, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信