HVQ-VAE: Variational auto-encoder with hyperbolic vector quantization

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shangyu Chen , Pengfei Fang , Mehrtash Harandi , Trung Le , Jianfei Cai , Dinh Phung
{"title":"HVQ-VAE: Variational auto-encoder with hyperbolic vector quantization","authors":"Shangyu Chen ,&nbsp;Pengfei Fang ,&nbsp;Mehrtash Harandi ,&nbsp;Trung Le ,&nbsp;Jianfei Cai ,&nbsp;Dinh Phung","doi":"10.1016/j.cviu.2025.104392","DOIUrl":null,"url":null,"abstract":"<div><div>Vector quantized-variational autoencoder (VQ-VAE) and its variants have made significant progress in creating discrete latent space via learning a codebook. Previous works on VQ-VAE have focused on discrete latent spaces in Euclidean or in spherical spaces. This paper studies the geometric prior of hyperbolic spaces as a way to improve the learning capacity of VQ-VAE. That being said, working with the VQ-VAE in the hyperbolic space is not without difficulties, and the benefits of using hyperbolic space as the geometric prior for the latent space have never been studied in VQ-VAE. We bridge this gap by developing the VQ-VAE with hyperbolic vector quantization. To this end, we propose the hyperbolic VQ-VAE (HVQ-VAE), which learns the latent embedding of data and the codebook in the hyperbolic space. Specifically, we endow the discrete latent space in the Poincaré ball, such that the clustering algorithm can be formulated and optimized in the Poincaré ball. Thorough experiments against various baselines are conducted to evaluate the superiority of the proposed HVQ-VAE empirically. We show that HVQ-VAE enjoys better image reconstruction, effective codebook usage, and fast convergence than baselines. We also present evidence that HVQ-VAE outperforms VQ-VAE in low-dimensional latent space.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"258 ","pages":"Article 104392"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001158","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Vector quantized-variational autoencoder (VQ-VAE) and its variants have made significant progress in creating discrete latent space via learning a codebook. Previous works on VQ-VAE have focused on discrete latent spaces in Euclidean or in spherical spaces. This paper studies the geometric prior of hyperbolic spaces as a way to improve the learning capacity of VQ-VAE. That being said, working with the VQ-VAE in the hyperbolic space is not without difficulties, and the benefits of using hyperbolic space as the geometric prior for the latent space have never been studied in VQ-VAE. We bridge this gap by developing the VQ-VAE with hyperbolic vector quantization. To this end, we propose the hyperbolic VQ-VAE (HVQ-VAE), which learns the latent embedding of data and the codebook in the hyperbolic space. Specifically, we endow the discrete latent space in the Poincaré ball, such that the clustering algorithm can be formulated and optimized in the Poincaré ball. Thorough experiments against various baselines are conducted to evaluate the superiority of the proposed HVQ-VAE empirically. We show that HVQ-VAE enjoys better image reconstruction, effective codebook usage, and fast convergence than baselines. We also present evidence that HVQ-VAE outperforms VQ-VAE in low-dimensional latent space.
双曲矢量量化的变分自编码器
矢量量化变分自编码器(VQ-VAE)及其变体在通过学习码本创建离散潜在空间方面取得了重大进展。以前关于VQ-VAE的工作主要集中在欧几里得空间或球面空间中的离散潜在空间。本文研究了双曲空间的几何先验,以提高VQ-VAE的学习能力。话虽如此,在双曲空间中使用VQ-VAE并非没有困难,并且使用双曲空间作为潜在空间的几何先验的好处从未在VQ-VAE中研究过。我们通过发展双曲矢量量化的VQ-VAE来弥补这一差距。为此,我们提出了双曲VQ-VAE (HVQ-VAE),它学习数据和码本在双曲空间中的潜在嵌入。具体地说,我们在poincar球中赋予离散潜空间,使得聚类算法可以在poincar球中进行表述和优化。针对各种基线进行了深入的实验,以经验评估所提出的HVQ-VAE的优越性。我们发现HVQ-VAE比基线具有更好的图像重建,有效的码本使用和快速收敛。我们也提供证据证明HVQ-VAE在低维潜在空间中优于VQ-VAE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信