LASERS：用于生成建模的稀疏性表征 LAtent Space 编码

arXiv - CS - Machine Learning Pub Date : 2024-09-16 DOI:arxiv-2409.11184

Xin Li, Anand Sarwate

{"title":"LASERS：用于生成建模的稀疏性表征 LAtent Space 编码","authors":"Xin Li, Anand Sarwate","doi":"arxiv-2409.11184","DOIUrl":null,"url":null,"abstract":"Learning compact and meaningful latent space representations has been shown\nto be very useful in generative modeling tasks for visual data. One particular\nexample is applying Vector Quantization (VQ) in variational autoencoders\n(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance\nin many modern generative modeling applications. Quantizing the latent space\nhas been justified by the assumption that the data themselves are inherently\ndiscrete in the latent space (like pixel values). In this paper, we propose an\nalternative representation of the latent space by relaxing the structural\nassumption than the VQ formulation. Specifically, we assume that the latent\nspace can be approximated by a union of subspaces model corresponding to a\ndictionary-based representation under a sparsity constraint. The dictionary is\nlearned/updated during the training process. We apply this approach to look at\ntwo models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs\nwith Generative Adversarial Networks (DL-GANs). We show empirically that our\nmore latent space is more expressive and has leads to better representations\nthan the VQ approach in terms of reconstruction quality at the expense of a\nsmall computational overhead for the latent space computation. Our results thus\nsuggest that the true benefit of the VQ approach might not be from\ndiscretization of the latent space, but rather the lossy compression of the\nlatent space. We confirm this hypothesis by showing that our sparse\nrepresentations also address the codebook collapse issue as found common in\nVQ-family models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling\",\"authors\":\"Xin Li, Anand Sarwate\",\"doi\":\"arxiv-2409.11184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning compact and meaningful latent space representations has been shown\\nto be very useful in generative modeling tasks for visual data. One particular\\nexample is applying Vector Quantization (VQ) in variational autoencoders\\n(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance\\nin many modern generative modeling applications. Quantizing the latent space\\nhas been justified by the assumption that the data themselves are inherently\\ndiscrete in the latent space (like pixel values). In this paper, we propose an\\nalternative representation of the latent space by relaxing the structural\\nassumption than the VQ formulation. Specifically, we assume that the latent\\nspace can be approximated by a union of subspaces model corresponding to a\\ndictionary-based representation under a sparsity constraint. The dictionary is\\nlearned/updated during the training process. We apply this approach to look at\\ntwo models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs\\nwith Generative Adversarial Networks (DL-GANs). We show empirically that our\\nmore latent space is more expressive and has leads to better representations\\nthan the VQ approach in terms of reconstruction quality at the expense of a\\nsmall computational overhead for the latent space computation. Our results thus\\nsuggest that the true benefit of the VQ approach might not be from\\ndiscretization of the latent space, but rather the lossy compression of the\\nlatent space. We confirm this hypothesis by showing that our sparse\\nrepresentations also address the codebook collapse issue as found common in\\nVQ-family models.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在视觉数据的生成建模任务中，学习紧凑而有意义的潜在空间表征已被证明非常有用。其中一个特别的例子就是在变异自动编码器（VQ-VAEs、VQ-GANs 等）中应用矢量量化（VQ），它在许多现代生成建模应用中都表现出了最先进的性能。对潜在空间进行量化的理由是假设数据本身在潜在空间中是离散的（如像素值）。在本文中，我们提出了潜在空间的替代表示方法，与 VQ 方法相比，我们放宽了结构假设。具体来说，我们假设在稀疏性约束条件下，潜空间可以通过与基于字典的表示相对应的子空间模型的联合来近似。字典在训练过程中学习/更新。我们将这种方法应用于两个模型：字典学习变异自动编码器（DL-VAE）和具有生成对抗网络（DL-GAN）的 DL-VAE。我们的实证结果表明，我们的潜空间更具表现力，在重构质量方面比 VQ 方法具有更好的代表性，但潜空间计算的开销较小。因此，我们的研究结果表明，VQ 方法的真正优势可能不是潜空间的离散化，而是对潜空间的有损压缩。我们通过证明我们的稀疏表示也解决了 VQ 系列模型中常见的代码集崩溃问题，从而证实了这一假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling

Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance in many modern generative modeling applications. Quantizing the latent space has been justified by the assumption that the data themselves are inherently discrete in the latent space (like pixel values). In this paper, we propose an alternative representation of the latent space by relaxing the structural assumption than the VQ formulation. Specifically, we assume that the latent space can be approximated by a union of subspaces model corresponding to a dictionary-based representation under a sparsity constraint. The dictionary is learned/updated during the training process. We apply this approach to look at two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs with Generative Adversarial Networks (DL-GANs). We show empirically that our more latent space is more expressive and has leads to better representations than the VQ approach in terms of reconstruction quality at the expense of a small computational overhead for the latent space computation. Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space. We confirm this hypothesis by showing that our sparse representations also address the codebook collapse issue as found common in VQ-family models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量