Coarse-to-fine semantic segmentation of satellite images

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2024-08-16 DOI:10.1016/j.isprsjprs.2024.07.028

Hao Chen , Wen Yang , Li Liu , Gui-Song Xia

{"title":"Coarse-to-fine semantic segmentation of satellite images","authors":"Hao Chen , Wen Yang , Li Liu , Gui-Song Xia","doi":"10.1016/j.isprsjprs.2024.07.028","DOIUrl":null,"url":null,"abstract":"<div><p>Training deep neural networks for semantic segmentation of aerial images relies heavily on obtaining a large number of precise pixel-level annotations, which can cause significant annotation expenses. Given the fact that acquiring fine-class annotations is considerably more challenging than obtaining coarse-class annotations, we present a novel semi-supervised learning framework, which utilizes high spatial resolution images annotated with coarse-class labels alongside a very small set of fine-grained annotated images as the training set, thereby achieving classification results that are refined in both spatial resolution and categorical granularity. Specifically, this framework adopts Mix Transformer (MiT) as the backbone architecture to accommodate both local feature extraction and long-range dependency modeling capabilities and utilizes multi-prototype learning to model each class as multiple sub-prototypes, preserving the intrinsic variance characteristics within classes. We propose a dedicated co-training approach tailored for extracting fine-grained pseudo-labels from coarse-grained samples. In this approach, a <em>local-softmax</em> pseudo-labeling strategy is developed to ensure a harmonious balance between the efficiency and accuracy of the pseudo-labeling, and four losses are formulated for both single-level class and cross-category granularity supervised learning. We evaluate the proposed framework on the Gaofen Image Dataset (GID) and Five-Billion-Pixels (FBP) dataset, confirming its feasibility and superior results. In particular, based on coarse-class annotations, the performance achieved using only 5% of fine-class labels, in terms of the four metrics, namely mIoU, mean UA, mean F1-score, and OA, reached 91%, 96%, 89%, and 93% of the fully-supervised baseline performance respectively. The code is available at <span><span>https://github.com/chenhaocs/C2F</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"217 ","pages":"Pages 1-17"},"PeriodicalIF":10.6000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624002958","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Training deep neural networks for semantic segmentation of aerial images relies heavily on obtaining a large number of precise pixel-level annotations, which can cause significant annotation expenses. Given the fact that acquiring fine-class annotations is considerably more challenging than obtaining coarse-class annotations, we present a novel semi-supervised learning framework, which utilizes high spatial resolution images annotated with coarse-class labels alongside a very small set of fine-grained annotated images as the training set, thereby achieving classification results that are refined in both spatial resolution and categorical granularity. Specifically, this framework adopts Mix Transformer (MiT) as the backbone architecture to accommodate both local feature extraction and long-range dependency modeling capabilities and utilizes multi-prototype learning to model each class as multiple sub-prototypes, preserving the intrinsic variance characteristics within classes. We propose a dedicated co-training approach tailored for extracting fine-grained pseudo-labels from coarse-grained samples. In this approach, a local-softmax pseudo-labeling strategy is developed to ensure a harmonious balance between the efficiency and accuracy of the pseudo-labeling, and four losses are formulated for both single-level class and cross-category granularity supervised learning. We evaluate the proposed framework on the Gaofen Image Dataset (GID) and Five-Billion-Pixels (FBP) dataset, confirming its feasibility and superior results. In particular, based on coarse-class annotations, the performance achieved using only 5% of fine-class labels, in terms of the four metrics, namely mIoU, mean UA, mean F1-score, and OA, reached 91%, 96%, 89%, and 93% of the fully-supervised baseline performance respectively. The code is available at https://github.com/chenhaocs/C2F.

查看原文本刊更多论文

卫星图像从粗到细的语义分割

训练用于航空图像语义分割的深度神经网络在很大程度上依赖于获取大量精确的像素级注释，这可能会导致大量的注释费用。鉴于获取细粒度注释比获取粗粒度注释更具挑战性，我们提出了一种新颖的半监督学习框架，利用注有粗粒度标签的高空间分辨率图像和极少量的细粒度注释图像作为训练集，从而获得在空间分辨率和分类粒度上都更加精细的分类结果。具体来说，该框架采用 Mix Transformer（MiT）作为骨干架构，兼顾了局部特征提取和长距离依赖建模功能，并利用多原型学习将每个类建模为多个子原型，从而保留了类内的内在差异特征。我们提出了一种专门用于从粗粒度样本中提取细粒度伪标签的联合训练方法。在这种方法中，我们开发了一种局部软最大伪标签策略，以确保伪标签的效率和准确性之间的和谐平衡，并为单级类别和跨类别粒度监督学习制定了四种损失。我们在高分图像数据集（GID）和五十亿像素数据集（FBP）上对所提出的框架进行了评估，证实了其可行性和优越性。特别是在粗类注释的基础上，仅使用 5%的细类标签，在 mIoU、平均 UA、平均 F1 分数和 OA 四个指标上取得的性能分别达到了完全监督基线性能的 91%、96%、89% 和 93%。代码见 https://github.com/chenhaocs/C2F。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.