Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang
{"title":"Graph Structure Guided Transformer for Semantic Segmentation","authors":"Luyang Qian, Canlong Zhang, Zhixin Li, Zhiwen Wang","doi":"10.1109/ICTAI56018.2022.00140","DOIUrl":null,"url":null,"abstract":"Segmentation is an essential operation of image processing, and utilizing long-range context information is the key for pixel-wise prediction tasks such as semantic segmentation. Convolutional Neural Networks (CNNs) are good at modeling local relationships through convolutional operations, but they are often inefficient in capturing global relationships between distant regions and require stacking multiple convolutional lay-ers. Utilizing the advantages of transformer in modeling long-range dependency, this paper proposes a novel Graph Structure Guided Transformer (GSGT) to realize semantic segmentation. Different from the previous methods that hard-divide the image in a regular grid manner, our graph projection method maps the two-dimensional feature map into a graph structure according to certain semantic relevance, so as to meet the data structure form required by the transformer. Meanwhile, to fully utilize the graph structure information, we also propose a graph embedding attention module, which utilizes the local topology of the graph structure to complement the global context of transformer. Moreover, GSGT is easy to be incorporated with various CNN backbones and transformer model variants to significantly improve the segmentation accuracy and convergence speed. Experiments on Cityscapes, VOC and ADE20K datasets demonstrate that the proposed method performs well in semantic seamentation task.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Segmentation is an essential operation of image processing, and utilizing long-range context information is the key for pixel-wise prediction tasks such as semantic segmentation. Convolutional Neural Networks (CNNs) are good at modeling local relationships through convolutional operations, but they are often inefficient in capturing global relationships between distant regions and require stacking multiple convolutional lay-ers. Utilizing the advantages of transformer in modeling long-range dependency, this paper proposes a novel Graph Structure Guided Transformer (GSGT) to realize semantic segmentation. Different from the previous methods that hard-divide the image in a regular grid manner, our graph projection method maps the two-dimensional feature map into a graph structure according to certain semantic relevance, so as to meet the data structure form required by the transformer. Meanwhile, to fully utilize the graph structure information, we also propose a graph embedding attention module, which utilizes the local topology of the graph structure to complement the global context of transformer. Moreover, GSGT is easy to be incorporated with various CNN backbones and transformer model variants to significantly improve the segmentation accuracy and convergence speed. Experiments on Cityscapes, VOC and ADE20K datasets demonstrate that the proposed method performs well in semantic seamentation task.