多尺度特征映射融合编码用于水下目标分割

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2024-12-13 DOI:10.1007/s10489-024-05971-4

Chengxiang Liu, Haoxin Yao, Wenhui Qiu, Hongyuan Cui, Yubin Fang, Anqi Xu

{"title":"多尺度特征映射融合编码用于水下目标分割","authors":"Chengxiang Liu, Haoxin Yao, Wenhui Qiu, Hongyuan Cui, Yubin Fang, Anqi Xu","doi":"10.1007/s10489-024-05971-4","DOIUrl":null,"url":null,"abstract":"<div><p>Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 2","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-scale feature map fusion encoding for underwater object segmentation\",\"authors\":\"Chengxiang Liu, Haoxin Yao, Wenhui Qiu, Hongyuan Cui, Yubin Fang, Anqi Xu\",\"doi\":\"10.1007/s10489-024-05971-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 2\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-05971-4\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05971-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于图像质量的下降和水下环境的复杂性，水下目标分割面临着巨大的挑战。近年来，深度学习为目标分割提供了一种有效的方法。然而，DeepLabV3+作为一般场景的经典模型，在复杂的水下条件下实现准确实时的分割存在局限性。为了解决这个问题，我们提出了DeepLab-FusionNet，这是DeepLabV3+的扩展版本，专为水下物体分割而设计。该模型采用多分辨率并行分支结构提取多尺度信息，并采用改进的倒立残差结构作为编码网络的基本特征提取模块。引入结构重参数化技术，优化推理阶段的推理速度和内存访问成本。此外，为了减少下采样和卷积过程中细节信息和空间信息的丢失，还构建了深层和浅层信息链接模块。对SUIM数据集的评估显示，与基线模型DeepLabV3+相比，平均交汇次数（mIoU）增加了3.3%，速度提高了每秒34帧（FPS）。与其他经典轻量化模型和基于transformer的模型在UIIS和TrashCan数据集上的进一步比较表明，我们的模型在具有挑战性的水下环境中具有良好的精度和平衡的计算效率。虽然由于过度拟合和固定卷积核的限制，还有改进的空间，但计划将来与Transformer方法集成。该模型为水下机器人实时目标分割提供了有效的解决方案，在人类海洋资源勘探开发中具有广泛的应用前景。我们的代码可在：https://github.com/sunmer1rain/deeplabv_fusionnet

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-scale feature map fusion encoding for underwater object segmentation

Underwater object segmentation presents significant challenges due to the degradation of image quality and the complexity of underwater environments. In recent years, deep learning has provided an effective approach for object segmentation. However, DeepLabV3+, as a classical model for general scenes, shows limitations in achieving accurate and real-time segmentation in complex underwater conditions. To address this issue, we propose a DeepLab-FusionNet, an extended version of DeepLabV3+, specifically designed for underwater object segmentation. The model utilizes a multi-resolution parallel branch structure to extract multi-scale information and employs an improved inverted residual structure as the basic feature extraction module in the encoding network. Structural reparameterization technique is introduced to optimize inference speed and memory access costs during the inference stage. Additionally, a module for linking deep and shallow level information is constructed to reduce the loss of detail and spatial information during downsampling and convolution. Evaluation on the SUIM dataset shows a 3.3% increase in mean Intersection over Union (mIoU) and a speed improvement of 34 frames per second (FPS) compared to the baseline model DeepLabV3+. Further comparisons with other classic lightweight models and Transformer-based models on the UIIS and TrashCan datasets demonstrate that our model achieves good accuracy and balanced computational efficiency in challenging underwater environments. Although there is room for improvement due to overfitting and fixed convolution kernel limitations, future integration with Transformer methods is planned. Our model offers an effective solution for real-time target segmentation for underwater robots, with broad applications in human exploration and development of marine resources. Our codes are available at: https://github.com/sunmer1rain/deeplabv_fusionnet

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.