DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-11-10 DOI:10.48550/arXiv.2211.05499

Azade Farshad, Yousef Yeganeh, Helisa Dhamo, F. Tombari, N. Navab

{"title":"DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation","authors":"Azade Farshad, Yousef Yeganeh, Helisa Dhamo, F. Tombari, N. Navab","doi":"10.48550/arXiv.2211.05499","DOIUrl":null,"url":null,"abstract":"Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks. Project Page: https://scenegenie.github.io/DispositioNet/","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"19 1","pages":"340"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.05499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks. Project Page: https://scenegenie.github.io/DispositioNet/

查看原文本刊更多论文

DisPositioNet:语义图像处理中的解纠缠姿态和身份

场景中对象及其关系的图形表示称为场景图，它提供了一个精确和可识别的接口，通过修改图中的节点或边来操纵场景。虽然现有的作品在修改物体的位置和姿态方面已经显示出令人满意的结果，但场景操纵往往会导致物体的外观或身份等视觉特征的丧失。在这项工作中，我们提出了DisPositioNet，这是一个模型，它以自监督的方式使用场景图学习图像处理任务中每个对象的解纠缠表示。我们的框架使变分潜在嵌入的解纠缠以及图中的特征表示成为可能。除了通过分解姿态和身份等特征产生更逼真的图像外，我们的方法还利用中间特征的概率抽样来生成更多样化的图像，用于物体替换或添加任务。我们的实验结果表明，在两个公共基准上，从模型的潜在流形中提取特征表示在定性和定量上都优于以前的工作。项目页面:https://scenegenie.github.io/DispositioNet/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量