基于共度量融合的多模态原型网络多镜头高光谱图像分类

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-06-18 DOI:10.1016/j.neucom.2025.130782

Yuhang Li , Jinrong He , Hanchi Liu , Yurong Zhang , Zhaokui Li

{"title":"基于共度量融合的多模态原型网络多镜头高光谱图像分类","authors":"Yuhang Li , Jinrong He , Hanchi Liu , Yurong Zhang , Zhaokui Li","doi":"10.1016/j.neucom.2025.130782","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of Hyperspectral image (HSI) classification, prototype-based network methods have achieved significant research progress. These methods utilize pixel-level information from images to construct central prototypes for each class, providing effective solutions for few-shot learning. However, traditional prototype networks have some inherent flaws; they primarily rely on a single image modality and fail to fully leverage the potential complementarity between different modalities, using only a single modality to generate class prototypes, which limits the model's performance in representing class prototypes and enhancing discriminative capabilities. And subtle inter-class differences are also a challenging task in cross-domain scenarios. To overcome these challenges, this study proposes an innovative Multimodal Prototypical Networks with Co-metric Fusion (MPCF). By integrating prototype information from both image and text modalities, MPCF significantly enhances the performance of few-shot learning. The method not only captures the spectral and spatial features of images to construct image prototypes but also extracts textual features from category descriptions to generate text prototypes. Furthermore, by integrating contrastive learning strategies with the Co-metric fusion mechanism, the method effectively harnesses the information from different modalities. This integration allows for the capture of category information across multiple dimensions, significantly boosting the model's discriminative power among various classes and enhancing its capacity to address few-shot learning scenarios. Experiments conducted on several public benchmark HSI datasets (Indian Pines-84.06 %, Houston-80.41 %, Salinas-92.63 %) demonstrate that MPCF exhibits excellent performance under few-shot and cross-domain conditions, achieving higher classification accuracy and stronger robustness compared to state-of-the-art methods. The related code will be made publicly available at the following URL: <span><span>https://github.com/AIYAU/MPCF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130782"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal prototypical networks with Co-metric fusion for few-shot hyperspectral image classification\",\"authors\":\"Yuhang Li , Jinrong He , Hanchi Liu , Yurong Zhang , Zhaokui Li\",\"doi\":\"10.1016/j.neucom.2025.130782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the field of Hyperspectral image (HSI) classification, prototype-based network methods have achieved significant research progress. These methods utilize pixel-level information from images to construct central prototypes for each class, providing effective solutions for few-shot learning. However, traditional prototype networks have some inherent flaws; they primarily rely on a single image modality and fail to fully leverage the potential complementarity between different modalities, using only a single modality to generate class prototypes, which limits the model's performance in representing class prototypes and enhancing discriminative capabilities. And subtle inter-class differences are also a challenging task in cross-domain scenarios. To overcome these challenges, this study proposes an innovative Multimodal Prototypical Networks with Co-metric Fusion (MPCF). By integrating prototype information from both image and text modalities, MPCF significantly enhances the performance of few-shot learning. The method not only captures the spectral and spatial features of images to construct image prototypes but also extracts textual features from category descriptions to generate text prototypes. Furthermore, by integrating contrastive learning strategies with the Co-metric fusion mechanism, the method effectively harnesses the information from different modalities. This integration allows for the capture of category information across multiple dimensions, significantly boosting the model's discriminative power among various classes and enhancing its capacity to address few-shot learning scenarios. Experiments conducted on several public benchmark HSI datasets (Indian Pines-84.06 %, Houston-80.41 %, Salinas-92.63 %) demonstrate that MPCF exhibits excellent performance under few-shot and cross-domain conditions, achieving higher classification accuracy and stronger robustness compared to state-of-the-art methods. The related code will be made publicly available at the following URL: <span><span>https://github.com/AIYAU/MPCF</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"648 \",\"pages\":\"Article 130782\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225014547\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014547","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在高光谱图像（HSI）分类领域，基于原型的网络方法取得了显著的研究进展。这些方法利用图像的像素级信息来构建每个类的中心原型，为少镜头学习提供了有效的解决方案。然而，传统的原型网络存在一些固有的缺陷；它们主要依赖单一的图像模态，未能充分利用不同模态之间潜在的互补性，仅使用单一模态生成类原型，限制了模型在表示类原型和增强判别能力方面的性能。在跨域场景中，微妙的类间差异也是一项具有挑战性的任务。为了克服这些挑战，本研究提出了一种具有共度量融合（MPCF）的创新多模态原型网络。通过整合图像和文本模型的原型信息，MPCF显著提高了少镜头学习的性能。该方法不仅捕获图像的光谱和空间特征构建图像原型，而且从类别描述中提取文本特征生成文本原型。此外，该方法将对比学习策略与共度量融合机制相结合，有效地利用了来自不同模态的信息。这种集成允许跨多个维度捕获类别信息，显著提高了模型在不同类别之间的判别能力，并增强了其处理少量学习场景的能力。在几个公共基准HSI数据集（Indian Pines-84.06 %,Houston-80.41 %,Salinas-92.63 %）上进行的实验表明，MPCF在少射击和跨域条件下表现出色，与最先进的方法相比，具有更高的分类精度和更强的鲁棒性。相关代码将通过以下URL公开提供：https://github.com/AIYAU/MPCF。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal prototypical networks with Co-metric fusion for few-shot hyperspectral image classification

In the field of Hyperspectral image (HSI) classification, prototype-based network methods have achieved significant research progress. These methods utilize pixel-level information from images to construct central prototypes for each class, providing effective solutions for few-shot learning. However, traditional prototype networks have some inherent flaws; they primarily rely on a single image modality and fail to fully leverage the potential complementarity between different modalities, using only a single modality to generate class prototypes, which limits the model's performance in representing class prototypes and enhancing discriminative capabilities. And subtle inter-class differences are also a challenging task in cross-domain scenarios. To overcome these challenges, this study proposes an innovative Multimodal Prototypical Networks with Co-metric Fusion (MPCF). By integrating prototype information from both image and text modalities, MPCF significantly enhances the performance of few-shot learning. The method not only captures the spectral and spatial features of images to construct image prototypes but also extracts textual features from category descriptions to generate text prototypes. Furthermore, by integrating contrastive learning strategies with the Co-metric fusion mechanism, the method effectively harnesses the information from different modalities. This integration allows for the capture of category information across multiple dimensions, significantly boosting the model's discriminative power among various classes and enhancing its capacity to address few-shot learning scenarios. Experiments conducted on several public benchmark HSI datasets (Indian Pines-84.06 %, Houston-80.41 %, Salinas-92.63 %) demonstrate that MPCF exhibits excellent performance under few-shot and cross-domain conditions, achieving higher classification accuracy and stronger robustness compared to state-of-the-art methods. The related code will be made publicly available at the following URL: https://github.com/AIYAU/MPCF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.