Probabilistic and semantic descriptions of image manifolds and their applications

IF 2.7 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Frontiers in Computer Science Pub Date : 2023-11-02 DOI:10.3389/fcomp.2023.1253682

Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang

{"title":"Probabilistic and semantic descriptions of image manifolds and their applications","authors":"Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang","doi":"10.3389/fcomp.2023.1253682","DOIUrl":null,"url":null,"abstract":"This paper begins with a description of methods for estimating probability density functions for images that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space—not every pattern of pixels is an image. It is common to say that images lie on a lower-dimensional manifold in the high-dimensional space. However, although images may lie on such lower-dimensional manifolds, it is not the case that all points on the manifold have an equal probability of being images. Images are unevenly distributed on the manifold, and our task is to devise ways to model this distribution as a probability distribution. In pursuing this goal, we consider generative models that are popular in AI and computer vision community. For our purposes, generative/probabilistic models should have the properties of (1) sample generation: it should be possible to sample from this distribution according to the modeled density function, and (2) probability computation: given a previously unseen sample from the dataset of interest, one should be able to compute the probability of the sample, at least up to a normalizing constant. To this end, we investigate the use of methods such as normalizing flow and diffusion models. We then show how semantic interpretations are used to describe points on the manifold. To achieve this, we consider an emergent language framework that makes use of variational encoders to produce a disentangled representation of points that reside on a given manifold. Trajectories between points on a manifold can then be described in terms of evolving semantic descriptions. In addition to describing the manifold in terms of density and semantic disentanglement, we also show that such probabilistic descriptions (bounded) can be used to improve semantic consistency by constructing defenses against adversarial attacks. We evaluate our methods on CelebA and point samples for likelihood estimation with improved semantic robustness and out-of-distribution detection capability, MNIST and CelebA for semantic disentanglement with explainable and editable semantic interpolation, and CelebA and Fashion-MNIST to defend against patch attacks with significantly improved classification accuracy. We also discuss the limitations of applying our likelihood estimation to 2D images in diffusion models.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":"28 19","pages":"0"},"PeriodicalIF":2.7000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2023.1253682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper begins with a description of methods for estimating probability density functions for images that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space—not every pattern of pixels is an image. It is common to say that images lie on a lower-dimensional manifold in the high-dimensional space. However, although images may lie on such lower-dimensional manifolds, it is not the case that all points on the manifold have an equal probability of being images. Images are unevenly distributed on the manifold, and our task is to devise ways to model this distribution as a probability distribution. In pursuing this goal, we consider generative models that are popular in AI and computer vision community. For our purposes, generative/probabilistic models should have the properties of (1) sample generation: it should be possible to sample from this distribution according to the modeled density function, and (2) probability computation: given a previously unseen sample from the dataset of interest, one should be able to compute the probability of the sample, at least up to a normalizing constant. To this end, we investigate the use of methods such as normalizing flow and diffusion models. We then show how semantic interpretations are used to describe points on the manifold. To achieve this, we consider an emergent language framework that makes use of variational encoders to produce a disentangled representation of points that reside on a given manifold. Trajectories between points on a manifold can then be described in terms of evolving semantic descriptions. In addition to describing the manifold in terms of density and semantic disentanglement, we also show that such probabilistic descriptions (bounded) can be used to improve semantic consistency by constructing defenses against adversarial attacks. We evaluate our methods on CelebA and point samples for likelihood estimation with improved semantic robustness and out-of-distribution detection capability, MNIST and CelebA for semantic disentanglement with explainable and editable semantic interpolation, and CelebA and Fashion-MNIST to defend against patch attacks with significantly improved classification accuracy. We also discuss the limitations of applying our likelihood estimation to 2D images in diffusion models.

查看原文本刊更多论文

图像流形的概率和语义描述及其应用

本文首先描述了估计图像概率密度函数的方法，这些方法反映了这样的观察，即这些数据通常被限制在高维图像空间的受限区域中——并非每个像素模式都是图像。通常说，图像位于高维空间中的低维流形上。然而，尽管图像可能位于这样的低维流形上，但并非流形上的所有点都具有相同的图像概率。图像在流形上的分布是不均匀的，我们的任务是设计出将这种分布建模为概率分布的方法。为了实现这一目标，我们考虑了在人工智能和计算机视觉社区中流行的生成模型。为了我们的目的，生成/概率模型应该具有(1)样本生成的特性:它应该可以根据建模的密度函数从这个分布中采样，以及(2)概率计算:给定感兴趣的数据集中以前未见过的样本，人们应该能够计算样本的概率，至少到一个归一化常数。为此，我们研究了正态流和扩散模型等方法的使用。然后我们将展示如何使用语义解释来描述流形上的点。为了实现这一点，我们考虑了一个新兴的语言框架，它利用变分编码器来产生驻留在给定流形上的点的解纠缠表示。流形上点之间的轨迹可以用演化的语义描述来描述。除了用密度和语义解纠缠来描述流形外，我们还表明这种概率描述(有界)可以通过构建对抗攻击的防御来提高语义一致性。我们在CelebA和点样本上评估了我们的方法，以提高语义鲁棒性和分布外检测能力进行似然估计，MNIST和CelebA通过可解释和可编辑的语义插值进行语义解纠集，CelebA和Fashion-MNIST用于防御补丁攻击，显著提高了分类精度。我们还讨论了将我们的似然估计应用于扩散模型中的二维图像的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊