零学习的视觉语义模糊交互网络

Xuemeng Hui;Zhunga Liu;Jiaxiang Liu;Zuowei Zhang;Longfei Wang
{"title":"零学习的视觉语义模糊交互网络","authors":"Xuemeng Hui;Zhunga Liu;Jiaxiang Liu;Zuowei Zhang;Longfei Wang","doi":"10.1109/TAI.2024.3524955","DOIUrl":null,"url":null,"abstract":"Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1345-1359"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning\",\"authors\":\"Xuemeng Hui;Zhunga Liu;Jiaxiang Liu;Zuowei Zhang;Longfei Wang\",\"doi\":\"10.1109/TAI.2024.3524955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 5\",\"pages\":\"1345-1359\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10820830/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10820830/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

零射击学习(Zero-shot learning, ZSL)的目的是使用手动定义的与未见和见过的图像相对应的语义知识来识别未见过的类图像对象。ZSL的关键在于建立精确图像数据与模糊语义知识之间的交互。这种模糊性归因于难以量化人类知识。然而,现有的ZSL方法在构建视觉语义交互过程中忽略了语义知识固有的模糊性,将其作为精确的数据处理。这不利于将语义知识从可见类转移到不可见类。为了解决这个问题,我们提出了一种视觉语义模糊交互网络(VSFIN)。VSFIN采用了一种有效的编码器-解码器结构,包括语义原型编码器(SPE)和视觉特征解码器(VFD)。SPE和VFD使视觉特征通过交叉注意与语义知识交互。为了在SPE和VFD中实现视觉-语义模糊交互,引入模糊集理论中的隶属函数概念,设计了隶属损失函数。这个损失函数允许在视觉-语义交互中存在一定程度的不精确性,从而使VSFIN能够逐渐利用给定的语义知识。此外,我们引入了秩和检验的概念,并提出了一个分布对齐损失来减轻对看到类的偏见。在三个广泛使用的基准测试中进行的大量实验表明,VSFIN在传统ZSL (CZSL)和广义ZSL (GZSL)设置下都优于当前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Visual–Semantic Fuzzy Interaction Network for Zero-Shot Learning
Zero-shot learning (ZSL) aims to recognize unseen class image objects using manually defined semantic knowledge corresponding to both seen and unseen images. The key of ZSL lies in building the interaction between precise image data and fuzzy semantic knowledge. The fuzziness is attributed to the difficulty in quantifying human knowledge. However, the existing ZSL methods ignore the inherent fuzziness of semantic knowledge and treat it as precise data during building the visual–semantic interaction. This is not good for transferring semantic knowledge from seen classes to unseen classes. In order to solve this problem, we propose a visual–semantic fuzzy interaction network (VSFIN) for ZSL. VSFIN utilize an effective encoder–decoder structure, including a semantic prototype encoder (SPE) and visual feature decoder (VFD). The SPE and VFD enable the visual features to interact with semantic knowledge via cross-attention. To achieve visual–semantic fuzzy interaction in SPE and VFD, we introduce the concept of membership function in fuzzy set theory and design a membership loss function. This loss function allows for a certain degree of imprecision in visual–semantic interaction, thereby enabling VSFIN to becomingly utilize the given semantic knowledge. Moreover, we introduce the concept of rank sum test and propose a distribution alignment loss to alleviate the bias towards seen classes. Extensive experiments on three widely used benchmarks have demonstrated that VSFIN outperforms current state-of-the-art methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信