GraPLUS: Graph-based Placement Using Semantics for image composition

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mir Mohammad Khaleghi, Mehran Safayani, Abdolreza Mirzaei
{"title":"GraPLUS: Graph-based Placement Using Semantics for image composition","authors":"Mir Mohammad Khaleghi,&nbsp;Mehran Safayani,&nbsp;Abdolreza Mirzaei","doi":"10.1016/j.cviu.2025.104427","DOIUrl":null,"url":null,"abstract":"<div><div>We present GraPLUS (Graph-based Placement Using Semantics), a novel framework for plausible object placement in images that leverages scene graphs and large language models. Our approach uniquely combines graph-structured scene representation with semantic understanding to determine contextually appropriate object positions. The framework employs GPT-2 to transform categorical node and edge labels into rich semantic embeddings that capture both definitional characteristics and typical spatial contexts, enabling a nuanced understanding of object relationships and placement patterns. GraPLUS achieves a placement accuracy of 92.1% and an FID score of 28.83 on the OPA dataset, outperforming state-of-the-art methods by 8.3% while maintaining competitive visual quality. In human evaluation studies involving 964 samples assessed by 38 participants, our method was preferred in 51.8% of cases, significantly outperforming previous approaches (25.8% and 22.4% for the next best methods). The framework’s key innovations include: (i) leveraging pre-trained scene graph models that transfer knowledge from other domains, eliminating the need to train feature extraction parameters from scratch, (ii) edge-aware graph neural networks that process scene semantics through structured relationships, (iii) a cross-modal attention mechanism that aligns categorical embeddings with enhanced scene features, and (iv) a multiobjective training strategy incorporating semantic consistency constraints. Extensive experiments demonstrate GraPLUS’s superior performance in both placement plausibility and spatial precision, with particular strengths in maintaining object proportions and contextual relationships across diverse scene types.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104427"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422500150X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

We present GraPLUS (Graph-based Placement Using Semantics), a novel framework for plausible object placement in images that leverages scene graphs and large language models. Our approach uniquely combines graph-structured scene representation with semantic understanding to determine contextually appropriate object positions. The framework employs GPT-2 to transform categorical node and edge labels into rich semantic embeddings that capture both definitional characteristics and typical spatial contexts, enabling a nuanced understanding of object relationships and placement patterns. GraPLUS achieves a placement accuracy of 92.1% and an FID score of 28.83 on the OPA dataset, outperforming state-of-the-art methods by 8.3% while maintaining competitive visual quality. In human evaluation studies involving 964 samples assessed by 38 participants, our method was preferred in 51.8% of cases, significantly outperforming previous approaches (25.8% and 22.4% for the next best methods). The framework’s key innovations include: (i) leveraging pre-trained scene graph models that transfer knowledge from other domains, eliminating the need to train feature extraction parameters from scratch, (ii) edge-aware graph neural networks that process scene semantics through structured relationships, (iii) a cross-modal attention mechanism that aligns categorical embeddings with enhanced scene features, and (iv) a multiobjective training strategy incorporating semantic consistency constraints. Extensive experiments demonstrate GraPLUS’s superior performance in both placement plausibility and spatial precision, with particular strengths in maintaining object proportions and contextual relationships across diverse scene types.
GraPLUS:使用语义进行图像合成的基于图形的放置
我们提出了GraPLUS(使用语义的基于图的放置),这是一个利用场景图和大型语言模型在图像中合理放置物体的新框架。我们的方法独特地将图结构场景表示与语义理解相结合,以确定上下文合适的对象位置。该框架采用GPT-2将分类节点和边缘标签转换为丰富的语义嵌入,捕获定义特征和典型的空间上下文,从而实现对对象关系和放置模式的细致理解。GraPLUS在OPA数据集上实现了92.1%的定位精度和28.83的FID分数,在保持具有竞争力的视觉质量的同时,比最先进的方法高出8.3%。在涉及38名参与者评估的964个样本的人体评估研究中,我们的方法在51.8%的病例中被首选,显著优于之前的方法(其次为25.8%和22.4%)。该框架的主要创新包括:(i)利用从其他领域转移知识的预训练场景图模型,消除从头开始训练特征提取参数的需要,(ii)通过结构化关系处理场景语义的边缘感知图神经网络,(iii)将分类嵌入与增强的场景特征对齐的跨模态注意机制,以及(iv)包含语义一致性约束的多目标训练策略。大量的实验表明,GraPLUS在放置合理性和空间精度方面都具有卓越的性能,特别是在不同场景类型中保持物体比例和上下文关系方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信