A Multimodal Fusion Scene Graph Generation Method Based on Semantic Description

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS) Pub Date : 2022-11-26 DOI:10.1109/CCIS57298.2022.10016416

Liwen Ma, Weifeng Liu, Yaning Wang

引用次数: 0

Abstract

For the scene graph generation task, a multimodal fusion scene graph generation method based on semantic description is proposed considering the problems of long-tail distribution and low frequency of high-level semantic interactions in the dataset. Firstly, target detection and relationship inference are performed on the image to construct an image scene graph. Second, the semantic descriptions are transformed into semantic graphs, which are fed into a pre-trained scene graph parser to construct semantic scene graphs. Finally, the two scene graphs are aligned for display and the information of nodes and edges are updated to obtain a fused scene graph with more comprehensive coverage and more accurate semantic interaction information.

查看原文本刊更多论文

基于语义描述的多模态融合场景图生成方法

针对场景图生成任务，考虑数据集中高层语义交互的长尾分布和低频率问题，提出了一种基于语义描述的多模态融合场景图生成方法。首先对图像进行目标检测和关系推断，构建图像场景图;其次，将语义描述转化为语义图，并将其输入到预训练的场景图解析器中构建语义场景图。最后，对两个场景图进行对齐显示，更新节点和边缘信息，得到覆盖范围更全面、语义交互信息更准确的融合场景图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)

自引率

0.00%

发文量