Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning.

IF 2 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-09-22 DOI:10.3390/e27090991

Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng, Minglei Shu

{"title":"Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning.","authors":"Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng, Minglei Shu","doi":"10.3390/e27090991","DOIUrl":null,"url":null,"abstract":"<p><p>With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model's sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model's generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470245/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090991","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model's sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model's generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset.

查看原文本刊更多论文

类别名称扩展和一种增强的多模态融合框架用于少射学习。

随着图像处理技术的进步，少镜头学习（few-shot learning， FSL）逐渐成为解决数据稀缺问题的关键方法。然而，现有的FSL方法通常依赖于有限样本条件下的单峰信息，因此难以捕获类别之间的细粒度差异。为了解决这个问题，我们提出了一种基于品类名称扩展和图像特征增强的多模态少镜头学习方法。该方法通过将扩展后的类别文本与图像特征相结合，丰富了类别的语义表示，提高了模型对细节特征的敏感性。为了进一步提高跨模态信息传输的质量，我们引入了一种跨模态残差连接策略，通过渐进融合来对齐跨层的特征。该方法使融合表征在减少冗余的同时实现互信息最大化，有效缓解了模态间熵分布不均匀造成的信息瓶颈，增强了模型的泛化能力。实验结果表明，我们的方法在自然图像数据集（CIFAR-FS和FC100）和医学图像数据集上都取得了优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.