{"title":"A Multimodal Fusion Scene Graph Generation Method Based on Semantic Description","authors":"Liwen Ma, Weifeng Liu, Yaning Wang","doi":"10.1109/CCIS57298.2022.10016416","DOIUrl":null,"url":null,"abstract":"For the scene graph generation task, a multimodal fusion scene graph generation method based on semantic description is proposed considering the problems of long-tail distribution and low frequency of high-level semantic interactions in the dataset. Firstly, target detection and relationship inference are performed on the image to construct an image scene graph. Second, the semantic descriptions are transformed into semantic graphs, which are fed into a pre-trained scene graph parser to construct semantic scene graphs. Finally, the two scene graphs are aligned for display and the information of nodes and edges are updated to obtain a fused scene graph with more comprehensive coverage and more accurate semantic interaction information.","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS57298.2022.10016416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
For the scene graph generation task, a multimodal fusion scene graph generation method based on semantic description is proposed considering the problems of long-tail distribution and low frequency of high-level semantic interactions in the dataset. Firstly, target detection and relationship inference are performed on the image to construct an image scene graph. Second, the semantic descriptions are transformed into semantic graphs, which are fed into a pre-trained scene graph parser to construct semantic scene graphs. Finally, the two scene graphs are aligned for display and the information of nodes and edges are updated to obtain a fused scene graph with more comprehensive coverage and more accurate semantic interaction information.