Tanmay Singha, Moritz Bergemann, Duc-Son Pham, A. Krishna
{"title":"SCMNet: Shared Context Mining Network for Real-time Semantic Segmentation","authors":"Tanmay Singha, Moritz Bergemann, Duc-Son Pham, A. Krishna","doi":"10.1109/DICTA52665.2021.9647401","DOIUrl":null,"url":null,"abstract":"Different architectures have been adopted for realtime scene segmentation. A popular design is the multi-branch approach in which multiple independent branches are deployed at the encoder side to filter input images at different resolutions. The main purpose is to reduce the computational cost and handle high resolution. However, independent branches do not contribute in the learning process. To address this issue, we introduce a novel approach in which two branches at the encoder share their knowledge whilst generating the global feature map. At each sharing point, the shared features will go through a new effective feature scaling module, called the Context Mining Module (CMM), which will refine the shared knowledge before passing it to the next stage. Finally, we introduce a new multidirectional feature fusion module which fuses deep contextual features with shallow features successively with accurate object localization. Our novel scene parsing model, termed SCMNet, produces 66.5% validation mIoU on the Cityscapes dataset and 78.6% on the Camvid dataset whilst having only 1.2 million parameters. Furthermore, the proposed model can efficiently handle higher resolution input images whilst having low computational cost. Our proposed model produces state-of-the-art results on Camvid.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA52665.2021.9647401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Different architectures have been adopted for realtime scene segmentation. A popular design is the multi-branch approach in which multiple independent branches are deployed at the encoder side to filter input images at different resolutions. The main purpose is to reduce the computational cost and handle high resolution. However, independent branches do not contribute in the learning process. To address this issue, we introduce a novel approach in which two branches at the encoder share their knowledge whilst generating the global feature map. At each sharing point, the shared features will go through a new effective feature scaling module, called the Context Mining Module (CMM), which will refine the shared knowledge before passing it to the next stage. Finally, we introduce a new multidirectional feature fusion module which fuses deep contextual features with shallow features successively with accurate object localization. Our novel scene parsing model, termed SCMNet, produces 66.5% validation mIoU on the Cityscapes dataset and 78.6% on the Camvid dataset whilst having only 1.2 million parameters. Furthermore, the proposed model can efficiently handle higher resolution input images whilst having low computational cost. Our proposed model produces state-of-the-art results on Camvid.