Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning.

IF 2 3区 物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY
Entropy Pub Date : 2025-09-22 DOI:10.3390/e27090991
Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng, Minglei Shu
{"title":"Category Name Expansion and an Enhanced Multimodal Fusion Framework for Few-Shot Learning.","authors":"Tianlei Gao, Lei Lyu, Xiaoyun Xie, Nuo Wei, Yushui Geng, Minglei Shu","doi":"10.3390/e27090991","DOIUrl":null,"url":null,"abstract":"<p><p>With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model's sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model's generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470245/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090991","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

With the advancement of image processing techniques, few-shot learning (FSL) has gradually become a key approach to addressing the problem of data scarcity. However, existing FSL methods often rely on unimodal information under limited sample conditions, making it difficult to capture fine-grained differences between categories. To address this issue, we propose a multimodal few-shot learning method based on category name expansion and image feature enhancement. By integrating the expanded category text with image features, the proposed method enriches the semantic representation of categories and enhances the model's sensitivity to detailed features. To further improve the quality of cross-modal information transfer, we introduce a cross-modal residual connection strategy that aligns features across layers through progressive fusion. This approach enables the fused representations to maximize mutual information while reducing redundancy, effectively alleviating the information bottleneck caused by uneven entropy distribution between modalities and enhancing the model's generalization ability. Experimental results demonstrate that our method achieves superior performance on both natural image datasets (CIFAR-FS and FC100) and a medical image dataset.

类别名称扩展和一种增强的多模态融合框架用于少射学习。
随着图像处理技术的进步,少镜头学习(few-shot learning, FSL)逐渐成为解决数据稀缺问题的关键方法。然而,现有的FSL方法通常依赖于有限样本条件下的单峰信息,因此难以捕获类别之间的细粒度差异。为了解决这个问题,我们提出了一种基于品类名称扩展和图像特征增强的多模态少镜头学习方法。该方法通过将扩展后的类别文本与图像特征相结合,丰富了类别的语义表示,提高了模型对细节特征的敏感性。为了进一步提高跨模态信息传输的质量,我们引入了一种跨模态残差连接策略,通过渐进融合来对齐跨层的特征。该方法使融合表征在减少冗余的同时实现互信息最大化,有效缓解了模态间熵分布不均匀造成的信息瓶颈,增强了模型的泛化能力。实验结果表明,我们的方法在自然图像数据集(CIFAR-FS和FC100)和医学图像数据集上都取得了优异的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Entropy
Entropy PHYSICS, MULTIDISCIPLINARY-
CiteScore
4.90
自引率
11.10%
发文量
1580
审稿时长
21.05 days
期刊介绍: Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信