{"title":"将卷积神经网络与变压器相结合,改进YOLOv7在多波束水柱图像中气体羽流检测与分割。","authors":"Wenguang Chen, Xiao Wang, Junjie Chen, Jialong Sun, Guozhen Zha","doi":"10.7717/peerj-cs.2923","DOIUrl":null,"url":null,"abstract":"<p><p>Multibeam bathymetry has become an effective underwater target detection method by using echo signals to generate a high-resolution water column image (WCI). However, the gas plume in the image is often affected by the seafloor environment and exhibits sparse texture and changing motion, making traditional detection and segmentation methods more time-consuming and labor-intensive. The emergence of convolutional neural networks (CNNs) alleviates this problem, but the local feature extraction of the convolutional operations, while capturing detailed information well, cannot adapt to the elongated morphology of the gas plume target, limiting the improvement of the detection and segmentation accuracy. Inspired by the transformer's ability to achieve global modeling through self-attention, we combine CNN with the transformer to improve the existing YOLOv7 (You Only Look Once version 7) model. First, we sequentially reduce the ELAN (Efficient Layer Aggregation Networks) structure in the backbone network and verify that using the enhanced feature extraction module only in the deep network is more effective in recognising the gas plume targets. Then, the C-BiFormer module is proposed, which can achieve effective collaboration between local feature extraction and global semantic modeling while reducing computing resources, and enhance the multi-scale feature extraction capability of the model. Finally, two different depths of networks are designed by stacking C-BiFormer modules with different numbers of layers. This improves the receptive field so that the model's detection and segmentation accuracy achieve different levels of improvement. Experimental results show that the improved model is smaller in size and more accurate compared to the baseline.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2923"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192642/pdf/","citationCount":"0","resultStr":"{\"title\":\"Combining convolutional neural network with transformer to improve YOLOv7 for gas plume detection and segmentation in multibeam water column images.\",\"authors\":\"Wenguang Chen, Xiao Wang, Junjie Chen, Jialong Sun, Guozhen Zha\",\"doi\":\"10.7717/peerj-cs.2923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Multibeam bathymetry has become an effective underwater target detection method by using echo signals to generate a high-resolution water column image (WCI). However, the gas plume in the image is often affected by the seafloor environment and exhibits sparse texture and changing motion, making traditional detection and segmentation methods more time-consuming and labor-intensive. The emergence of convolutional neural networks (CNNs) alleviates this problem, but the local feature extraction of the convolutional operations, while capturing detailed information well, cannot adapt to the elongated morphology of the gas plume target, limiting the improvement of the detection and segmentation accuracy. Inspired by the transformer's ability to achieve global modeling through self-attention, we combine CNN with the transformer to improve the existing YOLOv7 (You Only Look Once version 7) model. First, we sequentially reduce the ELAN (Efficient Layer Aggregation Networks) structure in the backbone network and verify that using the enhanced feature extraction module only in the deep network is more effective in recognising the gas plume targets. Then, the C-BiFormer module is proposed, which can achieve effective collaboration between local feature extraction and global semantic modeling while reducing computing resources, and enhance the multi-scale feature extraction capability of the model. Finally, two different depths of networks are designed by stacking C-BiFormer modules with different numbers of layers. This improves the receptive field so that the model's detection and segmentation accuracy achieve different levels of improvement. Experimental results show that the improved model is smaller in size and more accurate compared to the baseline.</p>\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"11 \",\"pages\":\"e2923\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192642/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2923\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2923","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
多波束测深技术利用回波信号生成高分辨率水柱图像,已成为一种有效的水下目标探测方法。然而,图像中的气体羽流经常受到海底环境的影响,呈现稀疏的纹理和变化的运动,使得传统的检测和分割方法更加耗时和费力。卷积神经网络(cnn)的出现缓解了这一问题,但卷积操作的局部特征提取虽然能很好地捕获细节信息,但不能适应气体羽流目标的细长形态,限制了检测和分割精度的提高。受变压器通过自关注实现全局建模能力的启发,我们将CNN与变压器结合起来,改进现有的YOLOv7 (You Only Look Once version 7)模型。首先,我们依次减少主干网络中的ELAN(高效层聚集网络)结构,并验证仅在深度网络中使用增强的特征提取模块更有效地识别气体羽流目标。然后,提出了C-BiFormer模块,在减少计算资源的同时实现局部特征提取与全局语义建模的有效协同,增强了模型的多尺度特征提取能力。最后,通过堆叠不同层数的C-BiFormer模块,设计了两种不同深度的网络。这提高了接收野,使模型的检测和分割精度得到了不同程度的提高。实验结果表明,与基线相比,改进后的模型尺寸更小,精度更高。
Combining convolutional neural network with transformer to improve YOLOv7 for gas plume detection and segmentation in multibeam water column images.
Multibeam bathymetry has become an effective underwater target detection method by using echo signals to generate a high-resolution water column image (WCI). However, the gas plume in the image is often affected by the seafloor environment and exhibits sparse texture and changing motion, making traditional detection and segmentation methods more time-consuming and labor-intensive. The emergence of convolutional neural networks (CNNs) alleviates this problem, but the local feature extraction of the convolutional operations, while capturing detailed information well, cannot adapt to the elongated morphology of the gas plume target, limiting the improvement of the detection and segmentation accuracy. Inspired by the transformer's ability to achieve global modeling through self-attention, we combine CNN with the transformer to improve the existing YOLOv7 (You Only Look Once version 7) model. First, we sequentially reduce the ELAN (Efficient Layer Aggregation Networks) structure in the backbone network and verify that using the enhanced feature extraction module only in the deep network is more effective in recognising the gas plume targets. Then, the C-BiFormer module is proposed, which can achieve effective collaboration between local feature extraction and global semantic modeling while reducing computing resources, and enhance the multi-scale feature extraction capability of the model. Finally, two different depths of networks are designed by stacking C-BiFormer modules with different numbers of layers. This improves the receptive field so that the model's detection and segmentation accuracy achieve different levels of improvement. Experimental results show that the improved model is smaller in size and more accurate compared to the baseline.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.