{"title":"Coarse-to-fine semantic segmentation of satellite images","authors":"","doi":"10.1016/j.isprsjprs.2024.07.028","DOIUrl":null,"url":null,"abstract":"<div><p>Training deep neural networks for semantic segmentation of aerial images relies heavily on obtaining a large number of precise pixel-level annotations, which can cause significant annotation expenses. Given the fact that acquiring fine-class annotations is considerably more challenging than obtaining coarse-class annotations, we present a novel semi-supervised learning framework, which utilizes high spatial resolution images annotated with coarse-class labels alongside a very small set of fine-grained annotated images as the training set, thereby achieving classification results that are refined in both spatial resolution and categorical granularity. Specifically, this framework adopts Mix Transformer (MiT) as the backbone architecture to accommodate both local feature extraction and long-range dependency modeling capabilities and utilizes multi-prototype learning to model each class as multiple sub-prototypes, preserving the intrinsic variance characteristics within classes. We propose a dedicated co-training approach tailored for extracting fine-grained pseudo-labels from coarse-grained samples. In this approach, a <em>local-softmax</em> pseudo-labeling strategy is developed to ensure a harmonious balance between the efficiency and accuracy of the pseudo-labeling, and four losses are formulated for both single-level class and cross-category granularity supervised learning. We evaluate the proposed framework on the Gaofen Image Dataset (GID) and Five-Billion-Pixels (FBP) dataset, confirming its feasibility and superior results. In particular, based on coarse-class annotations, the performance achieved using only 5% of fine-class labels, in terms of the four metrics, namely mIoU, mean UA, mean F1-score, and OA, reached 91%, 96%, 89%, and 93% of the fully-supervised baseline performance respectively. The code is available at <span><span>https://github.com/chenhaocs/C2F</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":null,"pages":null},"PeriodicalIF":10.6000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624002958","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Training deep neural networks for semantic segmentation of aerial images relies heavily on obtaining a large number of precise pixel-level annotations, which can cause significant annotation expenses. Given the fact that acquiring fine-class annotations is considerably more challenging than obtaining coarse-class annotations, we present a novel semi-supervised learning framework, which utilizes high spatial resolution images annotated with coarse-class labels alongside a very small set of fine-grained annotated images as the training set, thereby achieving classification results that are refined in both spatial resolution and categorical granularity. Specifically, this framework adopts Mix Transformer (MiT) as the backbone architecture to accommodate both local feature extraction and long-range dependency modeling capabilities and utilizes multi-prototype learning to model each class as multiple sub-prototypes, preserving the intrinsic variance characteristics within classes. We propose a dedicated co-training approach tailored for extracting fine-grained pseudo-labels from coarse-grained samples. In this approach, a local-softmax pseudo-labeling strategy is developed to ensure a harmonious balance between the efficiency and accuracy of the pseudo-labeling, and four losses are formulated for both single-level class and cross-category granularity supervised learning. We evaluate the proposed framework on the Gaofen Image Dataset (GID) and Five-Billion-Pixels (FBP) dataset, confirming its feasibility and superior results. In particular, based on coarse-class annotations, the performance achieved using only 5% of fine-class labels, in terms of the four metrics, namely mIoU, mean UA, mean F1-score, and OA, reached 91%, 96%, 89%, and 93% of the fully-supervised baseline performance respectively. The code is available at https://github.com/chenhaocs/C2F.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.