{"title":"Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder","authors":"Seunghwan Kim, Seungkyu Lee","doi":"arxiv-2409.09361","DOIUrl":null,"url":null,"abstract":"Variational autoencoder (VAE) is an established generative model but is\nnotorious for its blurriness. In this work, we investigate the blurry output\nproblem of VAE and resolve it, exploiting the variance of Gaussian decoder and\n$\\beta$ of beta-VAE. Specifically, we reveal that the indistinguishability of\ndecoder variance and $\\beta$ hinders appropriate analysis of the model by\nrandom likelihood value, and limits performance improvement by omitting the\ngain from $\\beta$. To address the problem, we propose Beta-Sigma VAE (BS-VAE)\nthat explicitly separates $\\beta$ and decoder variance $\\sigma^2_x$ in the\nmodel. Our method demonstrates not only superior performance in natural image\nsynthesis but also controllable parameters and predictable analysis compared to\nconventional VAE. In our experimental evaluation, we employ the analysis of\nrate-distortion curve and proxy metrics on computer vision datasets. The code\nis available on https://github.com/overnap/BS-VAE","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Variational autoencoder (VAE) is an established generative model but is
notorious for its blurriness. In this work, we investigate the blurry output
problem of VAE and resolve it, exploiting the variance of Gaussian decoder and
$\beta$ of beta-VAE. Specifically, we reveal that the indistinguishability of
decoder variance and $\beta$ hinders appropriate analysis of the model by
random likelihood value, and limits performance improvement by omitting the
gain from $\beta$. To address the problem, we propose Beta-Sigma VAE (BS-VAE)
that explicitly separates $\beta$ and decoder variance $\sigma^2_x$ in the
model. Our method demonstrates not only superior performance in natural image
synthesis but also controllable parameters and predictable analysis compared to
conventional VAE. In our experimental evaluation, we employ the analysis of
rate-distortion curve and proxy metrics on computer vision datasets. The code
is available on https://github.com/overnap/BS-VAE