{"title":"Swin-ResUNet: A Swin-Topology Module for Road Extraction from Remote Sensing Images","authors":"","doi":"10.1109/DICTA56598.2022.10034582","DOIUrl":null,"url":null,"abstract":"Road extraction from remote sensing images plays a crucial role in navigation, traffic management, urban construction, and other fields. With the development of deep learning in the field of computer vision, road extraction from remote sensing images using deep learning models has become a hot research topic. The convolution-based U-shaped road extraction models have some issues such as high extraction error rate and poor continuity on road topology. The Transformer-based road extraction methods also have issues such as low extraction accuracy and large GPU memory usage. In order to solve the above issues, we propose a Swin-ResUNet structure and use the new paradigm Swin Transformer to extract roads in remote sensing images. Specifically, we construct a Swin-Topology module by adding a Sobel layer based on residual connections to the Swin Transformer block. Based on the Swin-Topology module, we propose a Swin-ResUNet network structure in order to better capture the topology of roads. Experimental results show that the values of mIOU and mDC obtained on the Massachusetts dataset were 64.1% and 76.6% respectively, and the corresponding values on the DeepGlobe2018 dataset were 66.69% and 75.86% respectively. When the batch size is 8, the GPU memory usage with Swin-ResUNet is about 9 GB, which is significantly smaller than other Transformer-based networks. Compared with convolution-based U-shaped structures, the Swin-ResUNet can better capture the topology of roads in remote sensing images and improve road extraction accuracy. Compared with other Transformer-based road extraction methods, the Swin-ResUNet can improve the accuracy of road extraction and reduce GPU memory usage.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA56598.2022.10034582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Road extraction from remote sensing images plays a crucial role in navigation, traffic management, urban construction, and other fields. With the development of deep learning in the field of computer vision, road extraction from remote sensing images using deep learning models has become a hot research topic. The convolution-based U-shaped road extraction models have some issues such as high extraction error rate and poor continuity on road topology. The Transformer-based road extraction methods also have issues such as low extraction accuracy and large GPU memory usage. In order to solve the above issues, we propose a Swin-ResUNet structure and use the new paradigm Swin Transformer to extract roads in remote sensing images. Specifically, we construct a Swin-Topology module by adding a Sobel layer based on residual connections to the Swin Transformer block. Based on the Swin-Topology module, we propose a Swin-ResUNet network structure in order to better capture the topology of roads. Experimental results show that the values of mIOU and mDC obtained on the Massachusetts dataset were 64.1% and 76.6% respectively, and the corresponding values on the DeepGlobe2018 dataset were 66.69% and 75.86% respectively. When the batch size is 8, the GPU memory usage with Swin-ResUNet is about 9 GB, which is significantly smaller than other Transformer-based networks. Compared with convolution-based U-shaped structures, the Swin-ResUNet can better capture the topology of roads in remote sensing images and improve road extraction accuracy. Compared with other Transformer-based road extraction methods, the Swin-ResUNet can improve the accuracy of road extraction and reduce GPU memory usage.