Hongyun Wang , Jiachen Li , Xiang Xiang , Qing Xie , Yanchun Ma , Yongjian Liu
{"title":"GroupRF: Panoptic Scene Graph Generation with group relation tokens","authors":"Hongyun Wang , Jiachen Li , Xiang Xiang , Qing Xie , Yanchun Ma , Yongjian Liu","doi":"10.1016/j.jvcir.2025.104405","DOIUrl":null,"url":null,"abstract":"<div><div>Panoptic Scene Graph Generation (PSG) aims to predict a variety of relations between pairs of objects within an image, and indicate the objects by panoptic segmentation masks instead of bounding boxes. Existing PSG methods attempt to straightforwardly fuse the object tokens for relation prediction, thus failing to fully utilize the interaction between the pairwise objects. To address this problem, we propose a novel framework named <strong>Group R</strong>elation<strong>F</strong>ormer (GroupRF) to capture the fine-grained inter-dependency among all instances. Our method introduce a set of learnable tokens termed group rln tokens, which exploit fine-grained contextual interaction between object tokens with multiple attentive relations. In the process of relation prediction, we adopt multiple triplets to take advantage of the fine-grained interaction included in group rln tokens. We conduct comprehensive experiments on OpenPSG dataset, which show that our method outperforms the previous state-of-the-art method. Furthermore, we also show the effectiveness of our framework by ablation studies. Our code is available at <span><span>https://github.com/WHY-student/GroupRF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104405"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000197","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Panoptic Scene Graph Generation (PSG) aims to predict a variety of relations between pairs of objects within an image, and indicate the objects by panoptic segmentation masks instead of bounding boxes. Existing PSG methods attempt to straightforwardly fuse the object tokens for relation prediction, thus failing to fully utilize the interaction between the pairwise objects. To address this problem, we propose a novel framework named Group RelationFormer (GroupRF) to capture the fine-grained inter-dependency among all instances. Our method introduce a set of learnable tokens termed group rln tokens, which exploit fine-grained contextual interaction between object tokens with multiple attentive relations. In the process of relation prediction, we adopt multiple triplets to take advantage of the fine-grained interaction included in group rln tokens. We conduct comprehensive experiments on OpenPSG dataset, which show that our method outperforms the previous state-of-the-art method. Furthermore, we also show the effectiveness of our framework by ablation studies. Our code is available at https://github.com/WHY-student/GroupRF.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.