{"title":"High-Resolution Remote Sensing Image Segmentation With Global-Guided Normalization and Local Affinity Distillation","authors":"Peng Zhu;Xiangrong Zhang;Xiao Han;Puhua Chen;Xu Tang;Xina Cheng;Licheng Jiao","doi":"10.1109/TGRS.2024.3482688","DOIUrl":null,"url":null,"abstract":"In recent years, high-resolution (HR) remote sensing images (RSIs) segmentation has received growing attention. The huge number of pixels poses a challenge to the semantic segmentation algorithm, which is limited by the storage of GPUs, so the current methods for processing HR RSIs are categorized into two main categories, i.e., global methods and local methods. The former downsamples the original image and loses a lot of feature details. The latter crops the original image and fails to obtain global contextual information. Both types of methods lead to limited segmentation accuracy. In this article, we propose an end-to-end framework, called global injection network (GINet), which explores two levels of feature distribution and feature relationship to achieve tradeoff between global context and local details. In concrete terms, we propose the global-guided normalization (GGN) module, which injects global context information into local branch and modulates local features using global features to enhance the global perception of local branch. In addition, to constrain the spatial consistency of two branches, inspired by the knowledge distillation technique, we propose local affinity distillation (LAD) loss, which distills the relations in local features into global features to keep the similarity of the relationships corresponding to patches in the two branches. The comprehensive experimental results on three large-scale land-cover classification datasets, DeepGlobe (\n<inline-formula> <tex-math>$2448 \\times 2448$ </tex-math></inline-formula>\n), Inria Aerial (\n<inline-formula> <tex-math>$5000 \\times 5000$ </tex-math></inline-formula>\n), and GID-15 (\n<inline-formula> <tex-math>$7200 \\times 6800$ </tex-math></inline-formula>\n), confirm the effectiveness and superiority of our method in HR semantic segmentation tasks.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10741238/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, high-resolution (HR) remote sensing images (RSIs) segmentation has received growing attention. The huge number of pixels poses a challenge to the semantic segmentation algorithm, which is limited by the storage of GPUs, so the current methods for processing HR RSIs are categorized into two main categories, i.e., global methods and local methods. The former downsamples the original image and loses a lot of feature details. The latter crops the original image and fails to obtain global contextual information. Both types of methods lead to limited segmentation accuracy. In this article, we propose an end-to-end framework, called global injection network (GINet), which explores two levels of feature distribution and feature relationship to achieve tradeoff between global context and local details. In concrete terms, we propose the global-guided normalization (GGN) module, which injects global context information into local branch and modulates local features using global features to enhance the global perception of local branch. In addition, to constrain the spatial consistency of two branches, inspired by the knowledge distillation technique, we propose local affinity distillation (LAD) loss, which distills the relations in local features into global features to keep the similarity of the relationships corresponding to patches in the two branches. The comprehensive experimental results on three large-scale land-cover classification datasets, DeepGlobe (
$2448 \times 2448$
), Inria Aerial (
$5000 \times 5000$
), and GID-15 (
$7200 \times 6800$
), confirm the effectiveness and superiority of our method in HR semantic segmentation tasks.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.