Underwater Moving Object Detection using an End-to-End Encoder-Decoder Architecture and GraphSage with Aggregator and Refactoring

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00597

Meghna Kapoor, Suvam Patra, B. Subudhi, V. Jakhetiya, Ankur Bansal

{"title":"Underwater Moving Object Detection using an End-to-End Encoder-Decoder Architecture and GraphSage with Aggregator and Refactoring","authors":"Meghna Kapoor, Suvam Patra, B. Subudhi, V. Jakhetiya, Ankur Bansal","doi":"10.1109/CVPRW59228.2023.00597","DOIUrl":null,"url":null,"abstract":"Underwater environments are greatly affected by several factors, including low visibility, high turbidity, backscattering, dynamic background, etc., and hence pose challenges in object detection. Several algorithms consider convolutional neural networks to extract deep features and then object detection using the same. However, the dependency on the kernel’s size and the network’s depth results in fading relationships of latent space features and also are unable to characterize the spatial-contextual bonding of the pixels. Hence, they are unable to procure satisfactory results in complex underwater scenarios. To re-establish this relationship, we propose a unique architecture for underwater object detection where U-Net architecture is considered with the ResNet-50 backbone. Further, the latent space features from the encoder are fed to the decoder through a GraphSage model. GraphSage-based model is explored to reweight the node relationship in non-euclidean space using different aggregator functions and hence characterize the spatio-contextual bonding among the pixels. Further, we explored the dependency on different aggregator functions: mean, max, and LSTM, to evaluate the model’s performance. We evaluated the proposed model on two underwater benchmark databases: F4Knowledge and underwater change detection. The performance of the proposed model is evaluated against eleven state-of-the-art techniques in terms of both visual and quantitative evaluation measures.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"223 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Underwater environments are greatly affected by several factors, including low visibility, high turbidity, backscattering, dynamic background, etc., and hence pose challenges in object detection. Several algorithms consider convolutional neural networks to extract deep features and then object detection using the same. However, the dependency on the kernel’s size and the network’s depth results in fading relationships of latent space features and also are unable to characterize the spatial-contextual bonding of the pixels. Hence, they are unable to procure satisfactory results in complex underwater scenarios. To re-establish this relationship, we propose a unique architecture for underwater object detection where U-Net architecture is considered with the ResNet-50 backbone. Further, the latent space features from the encoder are fed to the decoder through a GraphSage model. GraphSage-based model is explored to reweight the node relationship in non-euclidean space using different aggregator functions and hence characterize the spatio-contextual bonding among the pixels. Further, we explored the dependency on different aggregator functions: mean, max, and LSTM, to evaluate the model’s performance. We evaluated the proposed model on two underwater benchmark databases: F4Knowledge and underwater change detection. The performance of the proposed model is evaluated against eleven state-of-the-art techniques in terms of both visual and quantitative evaluation measures.

查看原文本刊更多论文

基于端到端编码器-解码器结构和GraphSage的水下运动目标检测

水下环境受低能见度、高浊度、后向散射、动态背景等因素的影响较大，对目标检测提出了挑战。一些算法考虑卷积神经网络提取深层特征，然后使用相同的对象检测。然而，依赖于核的大小和网络的深度导致了潜在空间特征的衰落关系，也无法表征像素的空间-上下文结合。因此，它们无法在复杂的水下场景中获得令人满意的结果。为了重新建立这种关系，我们提出了一种独特的水下目标检测体系结构，其中U-Net体系结构与ResNet-50骨干网相结合。此外，来自编码器的潜在空间特征通过GraphSage模型馈送到解码器。探索了基于graphsage的模型，利用不同的聚合器函数在非欧几里得空间中重新加权节点关系，从而表征像素之间的空间-上下文结合。此外，我们探讨了对不同聚合器函数的依赖:mean, max和LSTM，以评估模型的性能。我们在F4Knowledge和水下变化检测两个水下基准数据库上对所提出的模型进行了评估。所提出的模型的性能在视觉和定量评估措施方面对11个最先进的技术进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量