Generative Data Augmentation via Wasserstein Autoencoder for Text Classification

Kyohoon Jin, Junho Lee, Juhwan Choi, Soojin Jang, Youngbin Kim
{"title":"Generative Data Augmentation via Wasserstein Autoencoder for Text Classification","authors":"Kyohoon Jin, Junho Lee, Juhwan Choi, Soojin Jang, Youngbin Kim","doi":"10.1109/ICTC55196.2022.9952762","DOIUrl":null,"url":null,"abstract":"Generative latent variable models are commonly used in text generation and augmentation. However generative latent variable models such as the variational autoencoder(VAE) experience a posterior collapse problem ignoring learning for a subset of latent variables during training. In particular, this phenomenon frequently occurs when the VAE is applied to natural language processing, which may degrade the reconstruction performance. In this paper, we propose a data augmentation method based on the pre-trained language model (PLM) using the Wasserstein autoencoder (WAE) structure. The WAE was used to prevent a posterior collapse in the generative model, and the PLM was placed in the encoder and decoder to improve the augmentation performance. We evaluated the proposed method on seven benchmark datasets and proved the augmentation effect.","PeriodicalId":441404,"journal":{"name":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC55196.2022.9952762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Generative latent variable models are commonly used in text generation and augmentation. However generative latent variable models such as the variational autoencoder(VAE) experience a posterior collapse problem ignoring learning for a subset of latent variables during training. In particular, this phenomenon frequently occurs when the VAE is applied to natural language processing, which may degrade the reconstruction performance. In this paper, we propose a data augmentation method based on the pre-trained language model (PLM) using the Wasserstein autoencoder (WAE) structure. The WAE was used to prevent a posterior collapse in the generative model, and the PLM was placed in the encoder and decoder to improve the augmentation performance. We evaluated the proposed method on seven benchmark datasets and proved the augmentation effect.
基于Wasserstein自动编码器的文本分类生成数据增强
生成式潜变量模型是文本生成和增强中常用的模型。然而,像变分自编码器(VAE)这样的生成式潜变量模型在训练过程中会遇到后验崩溃问题,忽略了对潜变量子集的学习。特别是当VAE应用于自然语言处理时,这种现象经常发生,可能会降低重建性能。本文提出了一种基于Wasserstein自编码器(WAE)结构的预训练语言模型(PLM)的数据增强方法。在生成模型中使用WAE来防止后验崩溃,并在编码器和解码器中放置PLM以提高增强性能。我们在7个基准数据集上对该方法进行了评估,并证明了该方法的增强效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信