Side-Scan Sonar Underwater Target Detection: Combining the Diffusion Model With an Improved YOLOv7 Model

IF 3.8 2区工程技术 Q1 ENGINEERING, CIVIL

IEEE Journal of Oceanic Engineering Pub Date : 2024-03-20 DOI:10.1109/JOE.2024.3379481

Xin Wen;Feihu Zhang;Chensheng Cheng;Xujia Hou;Guang Pan

{"title":"Side-Scan Sonar Underwater Target Detection: Combining the Diffusion Model With an Improved YOLOv7 Model","authors":"Xin Wen;Feihu Zhang;Chensheng Cheng;Xujia Hou;Guang Pan","doi":"10.1109/JOE.2024.3379481","DOIUrl":null,"url":null,"abstract":"Side-scan sonar (SSS) plays a crucial role in underwater exploration. Autonomous analysis of SSS images is vital for detecting unknown targets in underwater environments. However, due to the complexity of the underwater environment, few highlighted areas of the target, blurred feature details, and the difficulty of collecting data from SSS, achieving high-precision autonomous target recognition in SSS images is challenging. This article solves this problem by improving the You Only Look Once v7 (YOLOv7) model to achieve high-precision object detection in SSS images. First, we enhance and enlarge real and experimental images using the denoising–diffusion model to establish a self-made SSS image data set, as there are data pictures of the detection target in the SSS images obtained from real experiments. Since the SSS image has large areas without targets, this article introduces a vision transformer (ViT) for dynamic attention and global modeling, which improves the model's weight in the target region. Second, the convolutional block attention module is adopted to further improve the feature expression ability and reduce floating-point operations. Finally, this article uses Scylla-Intersection over Union as the loss function to increase the accuracy of the model's inference. Experiments on the SSS image data set demonstrate that the improved YOLOv7 model outperforms other technologies, with an average accuracy (mAP0.5) and (mAP0.5:0.95) of 78.00% and 48.11%, respectively. These results are 3.47% and 2.9% higher than the YOLOv7 model. The improved YOLOv7 algorithm proposed in this article has great potential for object detection and recognition of SSS images.","PeriodicalId":13191,"journal":{"name":"IEEE Journal of Oceanic Engineering","volume":"49 3","pages":"976-991"},"PeriodicalIF":3.8000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Oceanic Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10534346/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

Side-scan sonar (SSS) plays a crucial role in underwater exploration. Autonomous analysis of SSS images is vital for detecting unknown targets in underwater environments. However, due to the complexity of the underwater environment, few highlighted areas of the target, blurred feature details, and the difficulty of collecting data from SSS, achieving high-precision autonomous target recognition in SSS images is challenging. This article solves this problem by improving the You Only Look Once v7 (YOLOv7) model to achieve high-precision object detection in SSS images. First, we enhance and enlarge real and experimental images using the denoising–diffusion model to establish a self-made SSS image data set, as there are data pictures of the detection target in the SSS images obtained from real experiments. Since the SSS image has large areas without targets, this article introduces a vision transformer (ViT) for dynamic attention and global modeling, which improves the model's weight in the target region. Second, the convolutional block attention module is adopted to further improve the feature expression ability and reduce floating-point operations. Finally, this article uses Scylla-Intersection over Union as the loss function to increase the accuracy of the model's inference. Experiments on the SSS image data set demonstrate that the improved YOLOv7 model outperforms other technologies, with an average accuracy (mAP0.5) and (mAP0.5:0.95) of 78.00% and 48.11%, respectively. These results are 3.47% and 2.9% higher than the YOLOv7 model. The improved YOLOv7 algorithm proposed in this article has great potential for object detection and recognition of SSS images.

查看原文本刊更多论文

侧扫声纳水下目标探测：将扩散模型与改进的 YOLOv7 模型相结合

侧扫声纳（SSS）在水下探测中发挥着至关重要的作用。自主分析 SSS 图像对于探测水下环境中的未知目标至关重要。然而，由于水下环境的复杂性、目标高亮区域少、特征细节模糊以及 SSS 数据采集困难等原因，在 SSS 图像中实现高精度自主目标识别具有挑战性。本文通过改进 You Only Look Once v7（YOLOv7）模型来解决这一问题，从而实现 SSS 图像中的高精度目标检测。首先，我们利用去噪扩散模型对真实图像和实验图像进行增强和放大，建立一个自制的 SSS 图像数据集，因为在真实实验获得的 SSS 图像中存在检测目标的数据图片。由于 SSS 图像中有大片区域没有目标，本文引入了视觉变换器（ViT）进行动态关注和全局建模，提高了模型在目标区域的权重。其次，采用卷积块注意力模块，进一步提高特征表达能力，减少浮点运算。最后，本文采用 Scylla-Intersection over Union 作为损失函数，提高了模型推理的准确性。在 SSS 图像数据集上的实验表明，改进后的 YOLOv7 模型优于其他技术，其平均准确率（mAP0.5）和（mAP0.5:0.95）分别为 78.00% 和 48.11%。这些结果比 YOLOv7 模型分别高出 3.47% 和 2.9%。本文提出的改进型 YOLOv7 算法在 SSS 图像的物体检测和识别方面具有很大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Oceanic Engineering 工程技术-工程：大洋

CiteScore

9.60

自引率

12.20%

发文量

审稿时长

12 months

期刊介绍： The IEEE Journal of Oceanic Engineering (ISSN 0364-9059) is the online-only quarterly publication of the IEEE Oceanic Engineering Society (IEEE OES). The scope of the Journal is the field of interest of the IEEE OES, which encompasses all aspects of science, engineering, and technology that address research, development, and operations pertaining to all bodies of water. This includes the creation of new capabilities and technologies from concept design through prototypes, testing, and operational systems to sense, explore, understand, develop, use, and responsibly manage natural resources.