用于三模态图像少镜头语义分割的交互式融合与相关网络

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang
{"title":"用于三模态图像少镜头语义分割的交互式融合与相关网络","authors":"Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang","doi":"10.1109/LSP.2024.3456634","DOIUrl":null,"url":null,"abstract":"This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5\n<inline-formula><tex-math>$^{i}$</tex-math></inline-formula>\n dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation\",\"authors\":\"Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang\",\"doi\":\"10.1109/LSP.2024.3456634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5\\n<inline-formula><tex-math>$^{i}$</tex-math></inline-formula>\\n dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669915/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669915/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

这封信提出了一种新颖的三模态图像少镜头语义分割方法。以往的一些方法是在特征相关之前融合多种模态,但这会改变对后续特征匹配有用的原始视觉信息。还有一些方法是建立在早期相关学习的基础上,这会导致细节丢失,从而影响多模态融合。为了应对这些挑战,我们建立了一个新颖的交互式融合与相关网络(IFCNet)。具体来说,所提出的融合与关联(FC)模块以交互方式执行特征关联和基于注意力的多模态融合,从而建立有效的模态间互补性,并有利于模态内的查询支持关联。此外,我们还添加了多模态相关(MC)模块,利用多层余弦相似性图来丰富多模态视觉对应关系。在 VDT-2048-5$^{i}$ 数据集上进行的实验证明了该网络的卓越性能,在 1 次拍摄和 5 次拍摄设置中均优于现有的先进方法。研究还包括一项消融分析,以验证 FC 模块和 MC 模块对整体分割准确性的贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation
This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5 $^{i}$ dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信