Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-05-30 DOI:10.1016/j.inffus.2025.103339

Pedro Nuno Leite , Andry Maykol Pinto

{"title":"Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction","authors":"Pedro Nuno Leite , Andry Maykol Pinto","doi":"10.1016/j.inffus.2025.103339","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103339"},"PeriodicalIF":15.5000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004129","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124 m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.

查看原文本刊更多论文

基于金字塔注意力卷积的水下三维场景多模态信息融合

水下环境对光学系统提出了独特的挑战，因为物理现象会导致严重的数据退化。目前的成像传感器很少全面解决这些影响，导致需要整合互补的信息源。本文提出了一种多模态数据融合方法，将来自不同传感模式的信息组合成一个单一的密集和精确的三维表示。提出了一种用于水下场景重建（NESO）编码器-解码器网络的纹理与表观运动信息融合方法，该方法利用运动感知原理提取相对深度线索，并通过早期融合策略将其与纹理信息融合。在FLSea-Stereo数据集上进行评估，NESO比最先进的方法高出58.7%。密集深度图是通过多级跳过连接和注意机制实现的，这些机制确保了关键特征在网络级别上的传播。通过结合主动成像技术的稀疏但毫米精度的深度测量，这种表示进一步增强。基于回归的算法映射这些异构点云之间的深度位移，使用估计的曲线来改进密集NESO预测。在重建淹没阳极结构时，该方法实现了低至0.41%的相对误差，相对于初始测量值的度量改进高达0.1124 m。亚特兰蒂斯海岸试验台的验证证明了这种多模态融合方法在真实水下条件下获得鲁棒三维表征的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.