基于DVAE的文本建模预防后塌陷。

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-04-14 DOI:10.3390/e27040423

Tianbao Song, Zongyi Huang, Xin Liu, Jingbo Sun

{"title":"基于DVAE的文本建模预防后塌陷。","authors":"Tianbao Song, Zongyi Huang, Xin Liu, Jingbo Sun","doi":"10.3390/e27040423","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel variational autoencoder model termed DVAE to prevent posterior collapse in text modeling. DVAE employs a dual-path architecture within its decoder: path A and path B. Path A makes the direct input of text instances into the decoder, whereas path B replaces a subset of word tokens in the text instances with a generic unknown token before their input into the decoder. A stopping strategy is implemented, wherein both paths are concurrently active during the early phases of training. As the model progresses towards convergence, path B is removed. To further refine the performance, a KL weight dropout method is employed, which randomly sets certain dimensions of the KL weight to zero during the annealing process. DVAE compels the latent variables to encode more information about the input texts through path B and fully utilize the expressiveness of the decoder, as well as avoiding the local optimum when path B is active through path A and the stopping strategy. Furthermore, the KL weight dropout method augments the number of active units within the latent variables. Experimental results show the excellent performance of DVAE in density estimation, representation learning, and text generation.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 4","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026048/pdf/","citationCount":"0","resultStr":"{\"title\":\"Preventing Posterior Collapse with DVAE for Text Modeling.\",\"authors\":\"Tianbao Song, Zongyi Huang, Xin Liu, Jingbo Sun\",\"doi\":\"10.3390/e27040423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a novel variational autoencoder model termed DVAE to prevent posterior collapse in text modeling. DVAE employs a dual-path architecture within its decoder: path A and path B. Path A makes the direct input of text instances into the decoder, whereas path B replaces a subset of word tokens in the text instances with a generic unknown token before their input into the decoder. A stopping strategy is implemented, wherein both paths are concurrently active during the early phases of training. As the model progresses towards convergence, path B is removed. To further refine the performance, a KL weight dropout method is employed, which randomly sets certain dimensions of the KL weight to zero during the annealing process. DVAE compels the latent variables to encode more information about the input texts through path B and fully utilize the expressiveness of the decoder, as well as avoiding the local optimum when path B is active through path A and the stopping strategy. Furthermore, the KL weight dropout method augments the number of active units within the latent variables. Experimental results show the excellent performance of DVAE in density estimation, representation learning, and text generation.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 4\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12026048/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27040423\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27040423","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种新的变分自编码器模型，称为DVAE，以防止文本建模中的后验崩溃。DVAE在其解码器中采用双路径架构：路径a和路径B。路径a将文本实例直接输入到解码器中，而路径B在文本实例输入到解码器之前，将文本实例中的单词令牌子集替换为通用未知令牌。实现了一种停止策略，其中两条路径在训练的早期阶段同时活跃。随着模型趋于收敛，路径B被移除。为了进一步改善性能，采用KL权值dropout方法，在退火过程中随机设置KL权值的某些维度为零。DVAE迫使潜变量通过路径B对输入文本的更多信息进行编码，充分利用解码器的表现力，并通过路径A和停止策略避免路径B处于活动状态时的局部最优。此外，KL权重丢弃法增加了潜在变量内活动单元的数量。实验结果表明，该方法在密度估计、表示学习和文本生成等方面具有优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Preventing Posterior Collapse with DVAE for Text Modeling.

This paper introduces a novel variational autoencoder model termed DVAE to prevent posterior collapse in text modeling. DVAE employs a dual-path architecture within its decoder: path A and path B. Path A makes the direct input of text instances into the decoder, whereas path B replaces a subset of word tokens in the text instances with a generic unknown token before their input into the decoder. A stopping strategy is implemented, wherein both paths are concurrently active during the early phases of training. As the model progresses towards convergence, path B is removed. To further refine the performance, a KL weight dropout method is employed, which randomly sets certain dimensions of the KL weight to zero during the annealing process. DVAE compels the latent variables to encode more information about the input texts through path B and fully utilize the expressiveness of the decoder, as well as avoiding the local optimum when path B is active through path A and the stopping strategy. Furthermore, the KL weight dropout method augments the number of active units within the latent variables. Experimental results show the excellent performance of DVAE in density estimation, representation learning, and text generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.