REST：面向全场景遥感图像端到端语义分割的整体学习。

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-12 DOI:10.1109/tpami.2025.3609767

Wei Chen,Lorenzo Bruzzone,Bo Dang,Yuan Gao,Youming Deng,Jin-Gang Yu,Liangqi Yuan,Yansheng Li

{"title":"REST：面向全场景遥感图像端到端语义分割的整体学习。","authors":"Wei Chen,Lorenzo Bruzzone,Bo Dang,Yuan Gao,Youming Deng,Jin-Gang Yu,Liangqi Yuan,Yansheng Li","doi":"10.1109/tpami.2025.3609767","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of remote sensing imagery (RSI) is a fundamental task that aims at assigning a category label to each pixel. To pursue precise segmentation with one or more fine-grained categories, semantic segmentation often requires holistic segmentation of whole-scene RSI (WRI), which is normally characterized by a large size. However, conventional deep learning methods struggle to handle holistic segmentation of WRI due to the memory limitations of the graphics processing unit (GPU), thus requiring to adopt suboptimal strategies such as cropping or fusion, which result in performance degradation. Here, we introduce the Robust End-to-end semantic Segmentation architecture for whole-scene remoTe sensing imagery (REST). REST is the first intrinsically end-to-end framework for truly holistic segmentation of WRI, supporting a wide range of encoders and decoders in a plug-and-play fashion. It enables seamless integration with mainstream semantic segmentation methods, and even more advanced foundation models. Specifically, we propose a novel spatial parallel interaction mechanism (SPIM) within REST to overcome GPU memory constraints and achieve global context awareness. Unlike traditional parallel methods, SPIM enables REST to process a WRI effectively and efficiently by combining parallel computation with a divide-and-conquer strategy. Both theoretical analysis and experiments demonstrate that REST attains near-linear throughput scalability as additional GPUs are employed. Extensive experiments demonstrate that REST consistently outperforms existing cropping-based and fusion-based methods across a variety of scenarios, ranging from single-class to multi-class segmentation, from multispectral to hyperspectral imagery, and from satellite to drone platforms. The robustness and versatility of REST are expected to offer a promising solution for the holistic segmentation of WRI, with the potential for further extension to large-size medical imagery segmentation. The source code will be released at https://weichenrs.github.io/REST.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"73 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"REST: Holistic Learning for End-to-End Semantic Segmentation of Whole-Scene Remote Sensing Imagery.\",\"authors\":\"Wei Chen,Lorenzo Bruzzone,Bo Dang,Yuan Gao,Youming Deng,Jin-Gang Yu,Liangqi Yuan,Yansheng Li\",\"doi\":\"10.1109/tpami.2025.3609767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic segmentation of remote sensing imagery (RSI) is a fundamental task that aims at assigning a category label to each pixel. To pursue precise segmentation with one or more fine-grained categories, semantic segmentation often requires holistic segmentation of whole-scene RSI (WRI), which is normally characterized by a large size. However, conventional deep learning methods struggle to handle holistic segmentation of WRI due to the memory limitations of the graphics processing unit (GPU), thus requiring to adopt suboptimal strategies such as cropping or fusion, which result in performance degradation. Here, we introduce the Robust End-to-end semantic Segmentation architecture for whole-scene remoTe sensing imagery (REST). REST is the first intrinsically end-to-end framework for truly holistic segmentation of WRI, supporting a wide range of encoders and decoders in a plug-and-play fashion. It enables seamless integration with mainstream semantic segmentation methods, and even more advanced foundation models. Specifically, we propose a novel spatial parallel interaction mechanism (SPIM) within REST to overcome GPU memory constraints and achieve global context awareness. Unlike traditional parallel methods, SPIM enables REST to process a WRI effectively and efficiently by combining parallel computation with a divide-and-conquer strategy. Both theoretical analysis and experiments demonstrate that REST attains near-linear throughput scalability as additional GPUs are employed. Extensive experiments demonstrate that REST consistently outperforms existing cropping-based and fusion-based methods across a variety of scenarios, ranging from single-class to multi-class segmentation, from multispectral to hyperspectral imagery, and from satellite to drone platforms. The robustness and versatility of REST are expected to offer a promising solution for the holistic segmentation of WRI, with the potential for further extension to large-size medical imagery segmentation. The source code will be released at https://weichenrs.github.io/REST.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3609767\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3609767","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

遥感图像的语义分割（RSI）是一项基本任务，其目的是为每个像素分配一个类别标签。为了实现一个或多个细粒度类别的精确分割，语义分割往往需要对全场景RSI （whole-scene RSI， WRI）进行整体分割，而整体分割通常具有较大的规模。然而，由于图形处理单元（GPU）的内存限制，传统的深度学习方法难以处理WRI的整体分割，因此需要采用裁剪或融合等次优策略，从而导致性能下降。本文介绍了面向全场景遥感图像的鲁棒端到端语义分割架构（REST）。REST是第一个真正实现WRI整体分割的内在端到端框架，它以即插即用的方式支持各种编码器和解码器。它支持与主流语义分割方法的无缝集成，甚至更先进的基础模型。具体而言，我们在REST中提出了一种新的空间并行交互机制（SPIM）来克服GPU内存限制并实现全局上下文感知。与传统的并行方法不同，SPIM通过将并行计算与分而治之策略相结合，使REST能够有效地处理WRI。理论分析和实验都表明，REST在使用额外的gpu时实现了近似线性的吞吐量可扩展性。广泛的实验表明，REST在各种场景中始终优于现有的基于裁剪和基于融合的方法，从单类到多类分割，从多光谱到高光谱图像，从卫星到无人机平台。REST的鲁棒性和多功能性有望为WRI的整体分割提供一个有希望的解决方案，并有可能进一步扩展到大尺寸医学图像分割。源代码将在https://weichenrs.github.io/REST上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

REST: Holistic Learning for End-to-End Semantic Segmentation of Whole-Scene Remote Sensing Imagery.

Semantic segmentation of remote sensing imagery (RSI) is a fundamental task that aims at assigning a category label to each pixel. To pursue precise segmentation with one or more fine-grained categories, semantic segmentation often requires holistic segmentation of whole-scene RSI (WRI), which is normally characterized by a large size. However, conventional deep learning methods struggle to handle holistic segmentation of WRI due to the memory limitations of the graphics processing unit (GPU), thus requiring to adopt suboptimal strategies such as cropping or fusion, which result in performance degradation. Here, we introduce the Robust End-to-end semantic Segmentation architecture for whole-scene remoTe sensing imagery (REST). REST is the first intrinsically end-to-end framework for truly holistic segmentation of WRI, supporting a wide range of encoders and decoders in a plug-and-play fashion. It enables seamless integration with mainstream semantic segmentation methods, and even more advanced foundation models. Specifically, we propose a novel spatial parallel interaction mechanism (SPIM) within REST to overcome GPU memory constraints and achieve global context awareness. Unlike traditional parallel methods, SPIM enables REST to process a WRI effectively and efficiently by combining parallel computation with a divide-and-conquer strategy. Both theoretical analysis and experiments demonstrate that REST attains near-linear throughput scalability as additional GPUs are employed. Extensive experiments demonstrate that REST consistently outperforms existing cropping-based and fusion-based methods across a variety of scenarios, ranging from single-class to multi-class segmentation, from multispectral to hyperspectral imagery, and from satellite to drone platforms. The robustness and versatility of REST are expected to offer a promising solution for the holistic segmentation of WRI, with the potential for further extension to large-size medical imagery segmentation. The source code will be released at https://weichenrs.github.io/REST.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.