Graph Structure Guided Transformer for Semantic Segmentation

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2022-10-01 DOI:10.1109/ICTAI56018.2022.00140

Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang

{"title":"Graph Structure Guided Transformer for Semantic Segmentation","authors":"Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang","doi":"10.1109/ICTAI56018.2022.00140","DOIUrl":null,"url":null,"abstract":"Segmentation is an essential operation of image processing, and utilizing long-range context information is the key for pixel-wise prediction tasks such as semantic segmentation. Convolutional Neural Networks (CNNs) are good at modeling local relationships through convolutional operations, but they are often inefficient in capturing global relationships between distant regions and require stacking multiple convolutional lay-ers. Utilizing the advantages of transformer in modeling long-range dependency, this paper proposes a novel Graph Structure Guided Transformer (GSGT) to realize semantic segmentation. Different from the previous methods that hard-divide the image in a regular grid manner, our graph projection method maps the two-dimensional feature map into a graph structure according to certain semantic relevance, so as to meet the data structure form required by the transformer. Meanwhile, to fully utilize the graph structure information, we also propose a graph embedding attention module, which utilizes the local topology of the graph structure to complement the global context of transformer. Moreover, GSGT is easy to be incorporated with various CNN backbones and transformer model variants to significantly improve the segmentation accuracy and convergence speed. Experiments on Cityscapes, VOC and ADE20K datasets demonstrate that the proposed method performs well in semantic seamentation task.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Segmentation is an essential operation of image processing, and utilizing long-range context information is the key for pixel-wise prediction tasks such as semantic segmentation. Convolutional Neural Networks (CNNs) are good at modeling local relationships through convolutional operations, but they are often inefficient in capturing global relationships between distant regions and require stacking multiple convolutional lay-ers. Utilizing the advantages of transformer in modeling long-range dependency, this paper proposes a novel Graph Structure Guided Transformer (GSGT) to realize semantic segmentation. Different from the previous methods that hard-divide the image in a regular grid manner, our graph projection method maps the two-dimensional feature map into a graph structure according to certain semantic relevance, so as to meet the data structure form required by the transformer. Meanwhile, to fully utilize the graph structure information, we also propose a graph embedding attention module, which utilizes the local topology of the graph structure to complement the global context of transformer. Moreover, GSGT is easy to be incorporated with various CNN backbones and transformer model variants to significantly improve the segmentation accuracy and convergence speed. Experiments on Cityscapes, VOC and ADE20K datasets demonstrate that the proposed method performs well in semantic seamentation task.

查看原文本刊更多论文

面向语义分割的图结构导向变压器

图像分割是图像处理的一项基本操作，而利用远程上下文信息是实现语义分割等逐像素预测任务的关键。卷积神经网络(cnn)擅长通过卷积运算建模局部关系，但在捕获遥远区域之间的全局关系时往往效率低下，并且需要堆叠多个卷积层。利用变压器在远程依赖关系建模方面的优势，提出了一种新的图结构引导变压器(GSGT)来实现语义分割。不同于以往以规则网格方式对图像进行硬划分的方法，我们的图投影方法将二维特征映射按照一定的语义相关性映射成图结构，从而满足变压器所需的数据结构形式。同时，为了充分利用图结构信息，我们还提出了图嵌入关注模块，该模块利用图结构的局部拓扑来补充变压器的全局上下文。此外，GSGT易于与各种CNN骨干网和变压器模型变体结合，可以显著提高分割精度和收敛速度。在cityscape、VOC和ADE20K数据集上的实验表明，该方法能够很好地完成语义拼接任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量