MSSF-DCNet:利用密集连接网络进行多尺度选择性融合,用于声纳图像目标检测

Yu Dong, Jianlei Zhang, Chun-yan Zhang
{"title":"MSSF-DCNet:利用密集连接网络进行多尺度选择性融合,用于声纳图像目标检测","authors":"Yu Dong, Jianlei Zhang, Chun-yan Zhang","doi":"10.1117/12.3032084","DOIUrl":null,"url":null,"abstract":"In the field of underwater target recognition, forward-looking sonar images are widely applied in underwater rescue operations. The emergence of object detection technologies powered by deep learning has significantly enhanced the ability to recognize underwater targets. In object detection, the neck network, serving as a critical intermediary component, plays a vital role. However, traditional Feature Pyramid Networks (FPN) have two main problems: 1) During the feature fusion process, FPN does not modify the importance of features across various levels, resulting in imbalanced features at different scales and loss of scale information. 2) Lack of effective information transmission between features of different scales. In this article, we propose a novel neck network architecture, Multi Scale Selective Fusion with Dense Connectivity Network (MSSF-DCNet), which encompasses two components to tackle the previously mentioned challenges. The first one is the Multi Scale Selection Module, which effectively balances the weights of features at different levels during the feature fusion process by calculating and weighting weights for different scales, better preserving scale information. The second one is the Cross Scale Dense Connection module, which exchanges information between different feature layer levels. The model is capable of capturing global context information at every layer. thereby improving the detection capability of the neck network. By replacing the FPN with MSSF-DCNet in the Faster R-CNN framework, our model achieves an increase in Average Precision (AP) by 1.2, 4.0, and 2.6 points using MobileNet-v2, ResNet50, and SwinTransformer backbones, respectively. Furthermore, when employing ResNet50 as the backbone, MSSF-DCNet enhances the RetinaNet by 3.4 AP and ATSS by 4.1 AP. At the same time, we compared different neck networks with MSSF-DCNet on the Faster R-CNN baseline network, and MSSF-DCNet achieved the best performance in all metrics.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSSF-DCNet: multi-scale selective fusion with dense connectivity network for sonar image object detection\",\"authors\":\"Yu Dong, Jianlei Zhang, Chun-yan Zhang\",\"doi\":\"10.1117/12.3032084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of underwater target recognition, forward-looking sonar images are widely applied in underwater rescue operations. The emergence of object detection technologies powered by deep learning has significantly enhanced the ability to recognize underwater targets. In object detection, the neck network, serving as a critical intermediary component, plays a vital role. However, traditional Feature Pyramid Networks (FPN) have two main problems: 1) During the feature fusion process, FPN does not modify the importance of features across various levels, resulting in imbalanced features at different scales and loss of scale information. 2) Lack of effective information transmission between features of different scales. In this article, we propose a novel neck network architecture, Multi Scale Selective Fusion with Dense Connectivity Network (MSSF-DCNet), which encompasses two components to tackle the previously mentioned challenges. The first one is the Multi Scale Selection Module, which effectively balances the weights of features at different levels during the feature fusion process by calculating and weighting weights for different scales, better preserving scale information. The second one is the Cross Scale Dense Connection module, which exchanges information between different feature layer levels. The model is capable of capturing global context information at every layer. thereby improving the detection capability of the neck network. By replacing the FPN with MSSF-DCNet in the Faster R-CNN framework, our model achieves an increase in Average Precision (AP) by 1.2, 4.0, and 2.6 points using MobileNet-v2, ResNet50, and SwinTransformer backbones, respectively. Furthermore, when employing ResNet50 as the backbone, MSSF-DCNet enhances the RetinaNet by 3.4 AP and ATSS by 4.1 AP. At the same time, we compared different neck networks with MSSF-DCNet on the Faster R-CNN baseline network, and MSSF-DCNet achieved the best performance in all metrics.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3032084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在水下目标识别领域,前视声纳图像被广泛应用于水下救援行动。由深度学习驱动的物体检测技术的出现大大提高了识别水下目标的能力。在物体检测中,颈部网络作为关键的中介组件,发挥着至关重要的作用。然而,传统的特征金字塔网络(FPN)主要存在两个问题:1)在特征融合过程中,FPN 不会修改各层次特征的重要性,导致不同尺度的特征不平衡,尺度信息丢失。2) 不同尺度的特征之间缺乏有效的信息传递。在本文中,我们提出了一种新颖的颈部网络架构--多尺度选择性融合与密集连接网络(MSSF-DCNet),它包含两个组件来应对上述挑战。第一个是多尺度选择模块,它通过计算和加权不同尺度的权重,在特征融合过程中有效平衡不同层次的特征权重,从而更好地保留尺度信息。第二个模块是跨尺度密集连接模块,用于交换不同特征层之间的信息。该模型能够捕捉每一层的全局上下文信息,从而提高颈部网络的检测能力。通过在 Faster R-CNN 框架中用 MSSF-DCNet 替换 FPN,我们的模型在使用 MobileNet-v2、ResNet50 和 SwinTransformer 主干网时,平均精度 (AP) 分别提高了 1.2、4.0 和 2.6 点。此外,当使用 ResNet50 作为骨干网时,MSSF-DCNet 比 RetinaNet 提高了 3.4 个百分点,比 ATSS 提高了 4.1 个百分点。同时,我们在 Faster R-CNN 基准网络上比较了不同的颈部网络与 MSSF-DCNet 的性能,结果 MSSF-DCNet 在所有指标上都取得了最佳性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MSSF-DCNet: multi-scale selective fusion with dense connectivity network for sonar image object detection
In the field of underwater target recognition, forward-looking sonar images are widely applied in underwater rescue operations. The emergence of object detection technologies powered by deep learning has significantly enhanced the ability to recognize underwater targets. In object detection, the neck network, serving as a critical intermediary component, plays a vital role. However, traditional Feature Pyramid Networks (FPN) have two main problems: 1) During the feature fusion process, FPN does not modify the importance of features across various levels, resulting in imbalanced features at different scales and loss of scale information. 2) Lack of effective information transmission between features of different scales. In this article, we propose a novel neck network architecture, Multi Scale Selective Fusion with Dense Connectivity Network (MSSF-DCNet), which encompasses two components to tackle the previously mentioned challenges. The first one is the Multi Scale Selection Module, which effectively balances the weights of features at different levels during the feature fusion process by calculating and weighting weights for different scales, better preserving scale information. The second one is the Cross Scale Dense Connection module, which exchanges information between different feature layer levels. The model is capable of capturing global context information at every layer. thereby improving the detection capability of the neck network. By replacing the FPN with MSSF-DCNet in the Faster R-CNN framework, our model achieves an increase in Average Precision (AP) by 1.2, 4.0, and 2.6 points using MobileNet-v2, ResNet50, and SwinTransformer backbones, respectively. Furthermore, when employing ResNet50 as the backbone, MSSF-DCNet enhances the RetinaNet by 3.4 AP and ATSS by 4.1 AP. At the same time, we compared different neck networks with MSSF-DCNet on the Faster R-CNN baseline network, and MSSF-DCNet achieved the best performance in all metrics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信