分辨率不匹配：面向泛锐化的模态感知特征对齐网络

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-01 DOI:10.1109/TPAMI.2025.3594898

Man Zhou;Xuanhua He;Danfeng Hong

{"title":"分辨率不匹配：面向泛锐化的模态感知特征对齐网络","authors":"Man Zhou;Xuanhua He;Danfeng Hong","doi":"10.1109/TPAMI.2025.3594898","DOIUrl":null,"url":null,"abstract":"Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not <italic>explicitly</i> address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10753-10769"},"PeriodicalIF":18.6000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening\",\"authors\":\"Man Zhou;Xuanhua He;Danfeng Hong\",\"doi\":\"10.1109/TPAMI.2025.3594898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not <italic>explicitly</i> address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 11\",\"pages\":\"10753-10769\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11106767/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11106767/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

全色（PAN）和多光谱（MS）遥感卫星图像融合，即泛锐化，旨在通过将高分辨率、纹理丰富的PAN与低分辨率但高光谱分辨率的MS相结合的互补信息，生成高分辨率的MS图像。尽管在这一领域取得了显著进展，但目前最先进的泛锐化技术并没有明确解决PAN和MS图像两种模式之间的空间分辨率不匹配问题。这种不匹配问题会导致特征表示的不对齐，并在模型输出中产生模糊的伪影，最终阻碍高频纹理的生成，阻碍此类方法的性能提高。为了解决上述泛锐化中的空间分辨率不匹配问题，本文提出了一种新的模态感知特征对齐泛锐化框架。该框架包括三个主要阶段：模态感知特征提取、模态感知特征对齐和上下文集成图像重建。首先，我们引入半实例归一化策略作为主干，过滤掉不一致的特征，促进PAN和MS模式之间一致特征的学习。其次，设计了一种可学习的模态感知特征插值，有效地解决了不对齐问题。具体来说，从主干提取的特征被集成到预测每个像素的转换偏移，这允许自适应选择自定义上下文信息，并使模态感知特征更加一致。最后，在交互偏移校正的背景下，聚合多阶段信息，生成可行的泛锐化模型输出。在多个卫星数据集上的大量实验结果表明，所提出的算法在定性和定量上都优于其他最先进的方法，对现实场景表现出很强的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening

Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not explicitly address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量