{"title":"用于语义变化检测的半并行 CNN 变换器融合网络","authors":"Changzhong Zou, Ziyuan Wang","doi":"10.1016/j.imavis.2024.105157","DOIUrl":null,"url":null,"abstract":"<div><p>Semantic change detection (SCD) can recognize the region and the type of changes in remote sensing images. Existing methods are either based on transformer or convolutional neural network (CNN), but due to the size of various ground objects is different, it is necessary to have global modeling ability and local information extraction ability at the same time. Therefore, in this paper we propose a fusion semantic change detection network (FSCD) with both global modeling ability and local information extraction ability by fusing transformer and CNN. A semi-parallel fusion block has also been proposed to construct FSCD. It can not only have global and local features in parallel, but also fuse them as deeply as serial. To better adaptively decide which mechanism is applied to which pixel, we design a self-attention and convolution selection module (ACSM). ACSM is a self-attention mechanism used to selectively combine transformer and CNN. Specifically, the importance of each mechanism is automatically obtained by learning. According to the importance, the mechanism suitable for a pixel is selected, which is better than using either mechanism alone. We evaluate the proposed FSCD on two datasets, and the proposed network has a significant improvement compared with the state-of-the-art network.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A semi-parallel CNN-transformer fusion network for semantic change detection\",\"authors\":\"Changzhong Zou, Ziyuan Wang\",\"doi\":\"10.1016/j.imavis.2024.105157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Semantic change detection (SCD) can recognize the region and the type of changes in remote sensing images. Existing methods are either based on transformer or convolutional neural network (CNN), but due to the size of various ground objects is different, it is necessary to have global modeling ability and local information extraction ability at the same time. Therefore, in this paper we propose a fusion semantic change detection network (FSCD) with both global modeling ability and local information extraction ability by fusing transformer and CNN. A semi-parallel fusion block has also been proposed to construct FSCD. It can not only have global and local features in parallel, but also fuse them as deeply as serial. To better adaptively decide which mechanism is applied to which pixel, we design a self-attention and convolution selection module (ACSM). ACSM is a self-attention mechanism used to selectively combine transformer and CNN. Specifically, the importance of each mechanism is automatically obtained by learning. According to the importance, the mechanism suitable for a pixel is selected, which is better than using either mechanism alone. We evaluate the proposed FSCD on two datasets, and the proposed network has a significant improvement compared with the state-of-the-art network.</p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624002622\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624002622","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A semi-parallel CNN-transformer fusion network for semantic change detection
Semantic change detection (SCD) can recognize the region and the type of changes in remote sensing images. Existing methods are either based on transformer or convolutional neural network (CNN), but due to the size of various ground objects is different, it is necessary to have global modeling ability and local information extraction ability at the same time. Therefore, in this paper we propose a fusion semantic change detection network (FSCD) with both global modeling ability and local information extraction ability by fusing transformer and CNN. A semi-parallel fusion block has also been proposed to construct FSCD. It can not only have global and local features in parallel, but also fuse them as deeply as serial. To better adaptively decide which mechanism is applied to which pixel, we design a self-attention and convolution selection module (ACSM). ACSM is a self-attention mechanism used to selectively combine transformer and CNN. Specifically, the importance of each mechanism is automatically obtained by learning. According to the importance, the mechanism suitable for a pixel is selected, which is better than using either mechanism alone. We evaluate the proposed FSCD on two datasets, and the proposed network has a significant improvement compared with the state-of-the-art network.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.