Yuhang Li , Jinrong He , Hanchi Liu , Yurong Zhang , Zhaokui Li
{"title":"Multimodal prototypical networks with Co-metric fusion for few-shot hyperspectral image classification","authors":"Yuhang Li , Jinrong He , Hanchi Liu , Yurong Zhang , Zhaokui Li","doi":"10.1016/j.neucom.2025.130782","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of Hyperspectral image (HSI) classification, prototype-based network methods have achieved significant research progress. These methods utilize pixel-level information from images to construct central prototypes for each class, providing effective solutions for few-shot learning. However, traditional prototype networks have some inherent flaws; they primarily rely on a single image modality and fail to fully leverage the potential complementarity between different modalities, using only a single modality to generate class prototypes, which limits the model's performance in representing class prototypes and enhancing discriminative capabilities. And subtle inter-class differences are also a challenging task in cross-domain scenarios. To overcome these challenges, this study proposes an innovative Multimodal Prototypical Networks with Co-metric Fusion (MPCF). By integrating prototype information from both image and text modalities, MPCF significantly enhances the performance of few-shot learning. The method not only captures the spectral and spatial features of images to construct image prototypes but also extracts textual features from category descriptions to generate text prototypes. Furthermore, by integrating contrastive learning strategies with the Co-metric fusion mechanism, the method effectively harnesses the information from different modalities. This integration allows for the capture of category information across multiple dimensions, significantly boosting the model's discriminative power among various classes and enhancing its capacity to address few-shot learning scenarios. Experiments conducted on several public benchmark HSI datasets (Indian Pines-84.06 %, Houston-80.41 %, Salinas-92.63 %) demonstrate that MPCF exhibits excellent performance under few-shot and cross-domain conditions, achieving higher classification accuracy and stronger robustness compared to state-of-the-art methods. The related code will be made publicly available at the following URL: <span><span>https://github.com/AIYAU/MPCF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130782"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014547","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of Hyperspectral image (HSI) classification, prototype-based network methods have achieved significant research progress. These methods utilize pixel-level information from images to construct central prototypes for each class, providing effective solutions for few-shot learning. However, traditional prototype networks have some inherent flaws; they primarily rely on a single image modality and fail to fully leverage the potential complementarity between different modalities, using only a single modality to generate class prototypes, which limits the model's performance in representing class prototypes and enhancing discriminative capabilities. And subtle inter-class differences are also a challenging task in cross-domain scenarios. To overcome these challenges, this study proposes an innovative Multimodal Prototypical Networks with Co-metric Fusion (MPCF). By integrating prototype information from both image and text modalities, MPCF significantly enhances the performance of few-shot learning. The method not only captures the spectral and spatial features of images to construct image prototypes but also extracts textual features from category descriptions to generate text prototypes. Furthermore, by integrating contrastive learning strategies with the Co-metric fusion mechanism, the method effectively harnesses the information from different modalities. This integration allows for the capture of category information across multiple dimensions, significantly boosting the model's discriminative power among various classes and enhancing its capacity to address few-shot learning scenarios. Experiments conducted on several public benchmark HSI datasets (Indian Pines-84.06 %, Houston-80.41 %, Salinas-92.63 %) demonstrate that MPCF exhibits excellent performance under few-shot and cross-domain conditions, achieving higher classification accuracy and stronger robustness compared to state-of-the-art methods. The related code will be made publicly available at the following URL: https://github.com/AIYAU/MPCF.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.