Wenxiang Jiang , Yan Chen , Xiaofeng Wang , Menglei Kang , Mengyuan Wang , Xuejun Zhang , Lixiang Xu , Cheng Zhang
{"title":"Multi-branch reverse attention semantic segmentation network for building extraction","authors":"Wenxiang Jiang , Yan Chen , Xiaofeng Wang , Menglei Kang , Mengyuan Wang , Xuejun Zhang , Lixiang Xu , Cheng Zhang","doi":"10.1016/j.ejrs.2023.12.003","DOIUrl":null,"url":null,"abstract":"<div><p>Extraction of color and texture features of buildings from high-resolution remote sensing images often encounters the problems of interference of background information and varying target scales. In addition, most of the current attention mechanisms focus on building key feature selection for building extraction optimization, but ignore the influence of the complex background. Hence, we propose incorporating a novel reverse attention module into the network. The innovative module enables the model to selectively extract crucial building features while suppressing the impact of intricate background noise. It mitigates the influence of uniform spectral and structurally similar heterogeneous background targets on building segmentation and extraction. As a result, the overall generalizability of the model is improved. The reverse attention can also emphasize and amplify the specific details pertaining to the boundaries of the target. Furthermore, we couple a new multi-branch convolution block into the encoder, integrating dilated convolutions with multiple dilation rates. Compared to other methods that use only one multi-scale module to extract multi-scale information from high-level features, we use different receptive field convolutions to simultaneously capture multi-scale targets from multi-level features, effectively improving the ability of the model to extract multi-scale building features. The experimental findings demonstrate that our proposed multi-branch reverse attention semantic segmentation network achieves IoU of 90.59% and 81.79% on the well-known WHU building and Inria aerial image datasets, respectively.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110982323001035/pdfft?md5=0f9a312c78c3551ba2cf17857997a7db&pid=1-s2.0-S1110982323001035-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110982323001035","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Extraction of color and texture features of buildings from high-resolution remote sensing images often encounters the problems of interference of background information and varying target scales. In addition, most of the current attention mechanisms focus on building key feature selection for building extraction optimization, but ignore the influence of the complex background. Hence, we propose incorporating a novel reverse attention module into the network. The innovative module enables the model to selectively extract crucial building features while suppressing the impact of intricate background noise. It mitigates the influence of uniform spectral and structurally similar heterogeneous background targets on building segmentation and extraction. As a result, the overall generalizability of the model is improved. The reverse attention can also emphasize and amplify the specific details pertaining to the boundaries of the target. Furthermore, we couple a new multi-branch convolution block into the encoder, integrating dilated convolutions with multiple dilation rates. Compared to other methods that use only one multi-scale module to extract multi-scale information from high-level features, we use different receptive field convolutions to simultaneously capture multi-scale targets from multi-level features, effectively improving the ability of the model to extract multi-scale building features. The experimental findings demonstrate that our proposed multi-branch reverse attention semantic segmentation network achieves IoU of 90.59% and 81.79% on the well-known WHU building and Inria aerial image datasets, respectively.