Jinglin Zhang;Yuxia Li;Lei He;Bowei Zhang;Zhenye Niu;Yonghui Zhang;Shiyu Luo
{"title":"Controlled-SAM and Context Promoting Network for Fine-Grained Semantic Segmentation","authors":"Jinglin Zhang;Yuxia Li;Lei He;Bowei Zhang;Zhenye Niu;Yonghui Zhang;Shiyu Luo","doi":"10.1109/JSTARS.2025.3581620","DOIUrl":null,"url":null,"abstract":"Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This challenge becomes particularly evident in distinguishing classes such as rivers, ponds, and fishponds, which share similar spectral and spatial characteristics. To address these challenges, we propose CSCPNet, a novel framework optimized for fine-grained feature extraction and segmentation accuracy. CSCPNet features the controlled-segment anything model (SAM) encoder and the context promoting decoder. The controlled SAM encoder, by using shallow and deep feature fusion modules, integrates multiscale features from both a pretrained SAM encoder and a lightweight encoder, excelling in capturing detailed fine-grained features. The context promoting decoder with context attention is designed to iteratively refine feature maps through multistep decoding, effectively incorporating contextual information. Extensive experiments on FBP and ShengTeng datasets with fine-grained classes demonstrate that CSCPNet achieves state-of-the-art performance in fine-grained semantic segmentation. On the FBP dataset with 24 fine-grained classes, CSCPNet improves overall accuracy (OA), mean intersection over union (mIoU), and mF1 by 4.4%, 6.7%, and 9.3%, respectively. Similarly, on the ShengTeng dataset with 47 fine-grained classes, it achieves gains of 5.5% in OA, 7.3% in mIoU, and 7.9% in mF1. Meanwhile, CSCPNet maintains competitive accuracy in normal segmentation datasets such as Potsdam dataset and CZWZ dataset. These results demonstrate that CSCPNet excels at capturing fine-grained details and effectively distinguishing visually similar classes, making it a robust and efficient solution for fine-grained semantic segmentation of remote sensing images.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"15707-15724"},"PeriodicalIF":4.7000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045311","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11045311/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This challenge becomes particularly evident in distinguishing classes such as rivers, ponds, and fishponds, which share similar spectral and spatial characteristics. To address these challenges, we propose CSCPNet, a novel framework optimized for fine-grained feature extraction and segmentation accuracy. CSCPNet features the controlled-segment anything model (SAM) encoder and the context promoting decoder. The controlled SAM encoder, by using shallow and deep feature fusion modules, integrates multiscale features from both a pretrained SAM encoder and a lightweight encoder, excelling in capturing detailed fine-grained features. The context promoting decoder with context attention is designed to iteratively refine feature maps through multistep decoding, effectively incorporating contextual information. Extensive experiments on FBP and ShengTeng datasets with fine-grained classes demonstrate that CSCPNet achieves state-of-the-art performance in fine-grained semantic segmentation. On the FBP dataset with 24 fine-grained classes, CSCPNet improves overall accuracy (OA), mean intersection over union (mIoU), and mF1 by 4.4%, 6.7%, and 9.3%, respectively. Similarly, on the ShengTeng dataset with 47 fine-grained classes, it achieves gains of 5.5% in OA, 7.3% in mIoU, and 7.9% in mF1. Meanwhile, CSCPNet maintains competitive accuracy in normal segmentation datasets such as Potsdam dataset and CZWZ dataset. These results demonstrate that CSCPNet excels at capturing fine-grained details and effectively distinguishing visually similar classes, making it a robust and efficient solution for fine-grained semantic segmentation of remote sensing images.
期刊介绍:
The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.