{"title":"Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions","authors":"Xinhe Kuang, Yuxin Che, Huiyan Han, Yimin Liu","doi":"10.1007/s40747-024-01746-z","DOIUrl":null,"url":null,"abstract":"<p>The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"65 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01746-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.