Multidimensional Fusion Network for Multispectral Object Detection

IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Fan Yang;Binbin Liang;Wei Li;Jianwei Zhang
{"title":"Multidimensional Fusion Network for Multispectral Object Detection","authors":"Fan Yang;Binbin Liang;Wei Li;Jianwei Zhang","doi":"10.1109/TCSVT.2024.3454631","DOIUrl":null,"url":null,"abstract":"Multispectral object detection has attracted increasing attention recently due to its superior detection capacity under various illumination conditions. The key challenge lies in the effective aggregation of multi-spectral features to derive highly discriminative representations. To address this challenge, we propose a novel Multidimensional Fusion Network (MMFN) to explore multi-modal information from local, global, and channel perspectives. Specifically, at the local level, local features of different modalities and their inter-relationships are captured by a window-shifted fusion. As a complement to the local information, we designed a global interaction module that facilitates the fusion of holistic, high-level semantic information spanning the entire image. We distillate the channel dependencies and complementarities between different modalities through cross-channel learning and generate the final fused representation. Comprehensive experiments conducted on three publicly available datasets provide compelling evidence validating the superiority of the proposed methodology. The results exhibit notable performance gains over state-of-the-art multispectral object detectors. Our code will be released.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"547-560"},"PeriodicalIF":8.3000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10666754/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Multispectral object detection has attracted increasing attention recently due to its superior detection capacity under various illumination conditions. The key challenge lies in the effective aggregation of multi-spectral features to derive highly discriminative representations. To address this challenge, we propose a novel Multidimensional Fusion Network (MMFN) to explore multi-modal information from local, global, and channel perspectives. Specifically, at the local level, local features of different modalities and their inter-relationships are captured by a window-shifted fusion. As a complement to the local information, we designed a global interaction module that facilitates the fusion of holistic, high-level semantic information spanning the entire image. We distillate the channel dependencies and complementarities between different modalities through cross-channel learning and generate the final fused representation. Comprehensive experiments conducted on three publicly available datasets provide compelling evidence validating the superiority of the proposed methodology. The results exhibit notable performance gains over state-of-the-art multispectral object detectors. Our code will be released.
用于多光谱物体检测的多维融合网络
近年来,多光谱目标检测因其在各种光照条件下具有优异的检测能力而受到越来越多的关注。关键的挑战在于多光谱特征的有效聚合,以获得高度判别的表示。为了应对这一挑战,我们提出了一种新的多维融合网络(MMFN),从本地、全球和渠道的角度探索多模态信息。具体来说,在局部水平上,不同模态的局部特征及其相互关系被窗口移位融合捕获。作为局部信息的补充,我们设计了一个全局交互模块,促进了跨越整个图像的整体、高级语义信息的融合。我们通过跨通道学习提取不同模式之间的通道依赖性和互补性,并生成最终的融合表示。在三个公开可用的数据集上进行的综合实验提供了令人信服的证据,证明了所提出方法的优越性。结果表明,与最先进的多光谱目标探测器相比,性能有了显著提高。我们的代码将被发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信