{"title":"自动驾驶汽车语义分割的融合注意网络","authors":"Chuyao Wang, N. Aouf","doi":"10.1109/iv51971.2022.9827377","DOIUrl":null,"url":null,"abstract":"Semantic segmentation is vital for autonomous car scene understanding. It provides more precise subject information than raw RGB images and this, in turn, boosts the performance of autonomous driving. Recently, self-attention methods show great improvement in image semantic segmentation. Attention maps help scene parsing with abundant relationships of every pixel in an image. However, it is computationally demanding. Besides, existing works focus either on channel attention, ignoring the pixel position factors, or on spatial attention, disregarding the impacts of the channels on each other. To address these problems, we present Fusion Attention Network based on self-attention mechanism to harvest rich contextual dependencies. This model consists of two chains: pyramid fusion spatial attention and fusion channel attention. We apply pyramid sampling in the spatial attention module to reduce the computation for spatial attention maps. Channel attention has a similar structure to the spatial attention. We also introduce a fusion technique to calculate contextual dependencies using features from both attention chains. We concatenate the results from spatial and channel attention modules as the enhanced attention map, leading to better semantic segmentation results. We conduct extensive experiments on popular datasets with different settings in addition to an ablation study to prove the efficiency of our approach. Our model achieves better results, on Cityscapes [7], compared to state-of-the-art methods, and also show good generalization capability on PASCAL VOC 2012 [9].","PeriodicalId":184622,"journal":{"name":"2022 IEEE Intelligent Vehicles Symposium (IV)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fusion Attention Network for Autonomous Cars Semantic Segmentation\",\"authors\":\"Chuyao Wang, N. Aouf\",\"doi\":\"10.1109/iv51971.2022.9827377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation is vital for autonomous car scene understanding. It provides more precise subject information than raw RGB images and this, in turn, boosts the performance of autonomous driving. Recently, self-attention methods show great improvement in image semantic segmentation. Attention maps help scene parsing with abundant relationships of every pixel in an image. However, it is computationally demanding. Besides, existing works focus either on channel attention, ignoring the pixel position factors, or on spatial attention, disregarding the impacts of the channels on each other. To address these problems, we present Fusion Attention Network based on self-attention mechanism to harvest rich contextual dependencies. This model consists of two chains: pyramid fusion spatial attention and fusion channel attention. We apply pyramid sampling in the spatial attention module to reduce the computation for spatial attention maps. Channel attention has a similar structure to the spatial attention. We also introduce a fusion technique to calculate contextual dependencies using features from both attention chains. We concatenate the results from spatial and channel attention modules as the enhanced attention map, leading to better semantic segmentation results. We conduct extensive experiments on popular datasets with different settings in addition to an ablation study to prove the efficiency of our approach. Our model achieves better results, on Cityscapes [7], compared to state-of-the-art methods, and also show good generalization capability on PASCAL VOC 2012 [9].\",\"PeriodicalId\":184622,\"journal\":{\"name\":\"2022 IEEE Intelligent Vehicles Symposium (IV)\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Intelligent Vehicles Symposium (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iv51971.2022.9827377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iv51971.2022.9827377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fusion Attention Network for Autonomous Cars Semantic Segmentation
Semantic segmentation is vital for autonomous car scene understanding. It provides more precise subject information than raw RGB images and this, in turn, boosts the performance of autonomous driving. Recently, self-attention methods show great improvement in image semantic segmentation. Attention maps help scene parsing with abundant relationships of every pixel in an image. However, it is computationally demanding. Besides, existing works focus either on channel attention, ignoring the pixel position factors, or on spatial attention, disregarding the impacts of the channels on each other. To address these problems, we present Fusion Attention Network based on self-attention mechanism to harvest rich contextual dependencies. This model consists of two chains: pyramid fusion spatial attention and fusion channel attention. We apply pyramid sampling in the spatial attention module to reduce the computation for spatial attention maps. Channel attention has a similar structure to the spatial attention. We also introduce a fusion technique to calculate contextual dependencies using features from both attention chains. We concatenate the results from spatial and channel attention modules as the enhanced attention map, leading to better semantic segmentation results. We conduct extensive experiments on popular datasets with different settings in addition to an ablation study to prove the efficiency of our approach. Our model achieves better results, on Cityscapes [7], compared to state-of-the-art methods, and also show good generalization capability on PASCAL VOC 2012 [9].