UAMFDet：用于水下多模态目标检测的声光融合技术

IF 4.2 2区计算机科学 Q2 ROBOTICS

Journal of Field Robotics Pub Date : 2024-09-05 DOI:10.1002/rob.22432

Haojie Chen, Zhuo Wang, Hongde Qin, Xiaokai Mu

{"title":"UAMFDet：用于水下多模态目标检测的声光融合技术","authors":"Haojie Chen, Zhuo Wang, Hongde Qin, Xiaokai Mu","doi":"10.1002/rob.22432","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Underwater object detection serves as a crucial means for autonomous underwater vehicles (AUVs) to gain awareness of their surroundings. Currently, AUVs predominantly depend on underwater optical cameras or sonar sensing techniques to furnish vital information sources for subsequent tasks such as underwater rescue and mining exploration. However, the influence of underwater light attenuation or significant background noise often leads to the failure of either the acoustic or optical sensor. Consequently, the traditional single-modal object detection network, which relies exclusively on either the optical or acoustic modality, struggles to adapt to the varying complexities of underwater environments. To address this challenge, this paper proposes a novel underwater acoustic-optical fusion-based underwater multi-modal object detection paradigm termed UAMFDet, which fuses highly misaligned acoustic-optical features in the spatial dimension at both the fine-grained level and the instance level. First, we propose a multi-modal deformable self-aligned feature fusion module to adaptively capture feature dependencies between multi-modal targets, and perform self-aligned multi-modal fine-grained feature fusion by differential fusion. Then a multi-modal instance-level feature matching network is designed. It matches multi-modal instance features by a lightweight cross-attention mechanism and performs differential fusion to achieve instance-level feature fusion. In addition, we establish a data set dedicated to underwater acoustic-optical fusion object detection tasks called UAOF, and conduct a large number of experiments on the UAOF data set to verify the effectiveness of UAMFDet.</p>\n </div>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 4","pages":"970-983"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UAMFDet: Acoustic-Optical Fusion for Underwater Multi-Modal Object Detection\",\"authors\":\"Haojie Chen, Zhuo Wang, Hongde Qin, Xiaokai Mu\",\"doi\":\"10.1002/rob.22432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Underwater object detection serves as a crucial means for autonomous underwater vehicles (AUVs) to gain awareness of their surroundings. Currently, AUVs predominantly depend on underwater optical cameras or sonar sensing techniques to furnish vital information sources for subsequent tasks such as underwater rescue and mining exploration. However, the influence of underwater light attenuation or significant background noise often leads to the failure of either the acoustic or optical sensor. Consequently, the traditional single-modal object detection network, which relies exclusively on either the optical or acoustic modality, struggles to adapt to the varying complexities of underwater environments. To address this challenge, this paper proposes a novel underwater acoustic-optical fusion-based underwater multi-modal object detection paradigm termed UAMFDet, which fuses highly misaligned acoustic-optical features in the spatial dimension at both the fine-grained level and the instance level. First, we propose a multi-modal deformable self-aligned feature fusion module to adaptively capture feature dependencies between multi-modal targets, and perform self-aligned multi-modal fine-grained feature fusion by differential fusion. Then a multi-modal instance-level feature matching network is designed. It matches multi-modal instance features by a lightweight cross-attention mechanism and performs differential fusion to achieve instance-level feature fusion. In addition, we establish a data set dedicated to underwater acoustic-optical fusion object detection tasks called UAOF, and conduct a large number of experiments on the UAOF data set to verify the effectiveness of UAMFDet.</p>\\n </div>\",\"PeriodicalId\":192,\"journal\":{\"name\":\"Journal of Field Robotics\",\"volume\":\"42 4\",\"pages\":\"970-983\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Field Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rob.22432\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22432","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

水下物体探测是自动潜航器（AUV）感知周围环境的重要手段。目前，自动潜航器主要依靠水下光学相机或声纳传感技术为后续任务（如水下救援和采矿勘探）提供重要的信息源。然而，水下光衰减或巨大背景噪声的影响往往会导致声学或光学传感器失效。因此，完全依赖光学或声学模式的传统单模式物体检测网络难以适应水下环境的各种复杂性。为了应对这一挑战，本文提出了一种新颖的基于声光融合的水下多模态物体检测范例，称为 UAMFDet，它在细粒度级别和实例级别的空间维度上融合了高度错位的声光特征。首先，我们提出了一个多模态可变形自对齐特征融合模块，用于自适应捕捉多模态目标之间的特征依赖关系，并通过差分融合执行自对齐多模态细粒度特征融合。然后设计一个多模态实例级特征匹配网络。它通过轻量级交叉关注机制匹配多模态实例特征，并执行差分融合以实现实例级特征融合。此外，我们还建立了一个专门用于水下声光融合物体检测任务的数据集 UAOF，并在 UAOF 数据集上进行了大量实验，以验证 UAMFDet 的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UAMFDet: Acoustic-Optical Fusion for Underwater Multi-Modal Object Detection

Underwater object detection serves as a crucial means for autonomous underwater vehicles (AUVs) to gain awareness of their surroundings. Currently, AUVs predominantly depend on underwater optical cameras or sonar sensing techniques to furnish vital information sources for subsequent tasks such as underwater rescue and mining exploration. However, the influence of underwater light attenuation or significant background noise often leads to the failure of either the acoustic or optical sensor. Consequently, the traditional single-modal object detection network, which relies exclusively on either the optical or acoustic modality, struggles to adapt to the varying complexities of underwater environments. To address this challenge, this paper proposes a novel underwater acoustic-optical fusion-based underwater multi-modal object detection paradigm termed UAMFDet, which fuses highly misaligned acoustic-optical features in the spatial dimension at both the fine-grained level and the instance level. First, we propose a multi-modal deformable self-aligned feature fusion module to adaptively capture feature dependencies between multi-modal targets, and perform self-aligned multi-modal fine-grained feature fusion by differential fusion. Then a multi-modal instance-level feature matching network is designed. It matches multi-modal instance features by a lightweight cross-attention mechanism and performs differential fusion to achieve instance-level feature fusion. In addition, we establish a data set dedicated to underwater acoustic-optical fusion object detection tasks called UAOF, and conduct a large number of experiments on the UAOF data set to verify the effectiveness of UAMFDet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Field Robotics 工程技术-机器人学

CiteScore

15.00

自引率

3.60%

发文量

审稿时长

6 months

期刊介绍： The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.