{"title":"分辨率不匹配:面向泛锐化的模态感知特征对齐网络","authors":"Man Zhou;Xuanhua He;Danfeng Hong","doi":"10.1109/TPAMI.2025.3594898","DOIUrl":null,"url":null,"abstract":"Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not <italic>explicitly</i> address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10753-10769"},"PeriodicalIF":18.6000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening\",\"authors\":\"Man Zhou;Xuanhua He;Danfeng Hong\",\"doi\":\"10.1109/TPAMI.2025.3594898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not <italic>explicitly</i> address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 11\",\"pages\":\"10753-10769\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11106767/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11106767/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Resolution Mismatching: Modality-Aware Feature-Aligned Network for Pan-Sharpening
Panchromatic (PAN) and multi-spectral (MS) remote satellite image fusion, known as pan-sharpening, aims to produce high-resolution MS images by combining the complementary information from the high-resolution, texture-rich PAN and the low-resolution but high spectral-resolution MS counterparts. Despite notable advancements in this field, the current state-of-the-art pan-sharpening techniques do not explicitly address the spatial resolution mismatching problem between the two modalities of PAN and MS images. This mismatching issue can lead to misalignment in feature representation and the creation of blurry artifacts in the model output, ultimately hindering the generation of high-frequency textures and impeding the performance improvement of such methods. To address the aforementioned spatial resolution mismatching problem in pan-sharpening, we propose a novel modality-aware feature-aligned pan-sharpening framework in this paper. The framework comprises three primary stages: modality-aware feature extraction, modality-aware feature aligning, and context integrated image reconstruction. First, we introduce the half-instance normalization strategy as the backbone to filter out the inconsistent features and promote the learning of consistent features between the PAN and MS modalities. Second, a learnable modality-aware feature interpolation is devised to effectively address the misalignment issue. Specifically, the extracted features from the backbone are integrated to predict the transformation offsets of each pixel, which allows for the adaptive selection of custom contextual information and enables the modality-aware features to be more aligned. Finally, within the context of the interactive offset correction, multi-stage information is aggregated to generate the feasible pan-sharpened model output. Extensive experimental results over multiple satellite datasets demonstrate that the proposed algorithm outperforms other state-of-the-art methods both qualitatively and quantitatively, exhibiting great generalization ability to real-world scenes.