Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Pedro Nuno Leite , Andry Maykol Pinto
{"title":"Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction","authors":"Pedro Nuno Leite ,&nbsp;Andry Maykol Pinto","doi":"10.1016/j.inffus.2025.103339","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124<!--> <!-->m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103339"},"PeriodicalIF":15.5000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004129","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124 m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.
基于金字塔注意力卷积的水下三维场景多模态信息融合
水下环境对光学系统提出了独特的挑战,因为物理现象会导致严重的数据退化。目前的成像传感器很少全面解决这些影响,导致需要整合互补的信息源。本文提出了一种多模态数据融合方法,将来自不同传感模式的信息组合成一个单一的密集和精确的三维表示。提出了一种用于水下场景重建(NESO)编码器-解码器网络的纹理与表观运动信息融合方法,该方法利用运动感知原理提取相对深度线索,并通过早期融合策略将其与纹理信息融合。在FLSea-Stereo数据集上进行评估,NESO比最先进的方法高出58.7%。密集深度图是通过多级跳过连接和注意机制实现的,这些机制确保了关键特征在网络级别上的传播。通过结合主动成像技术的稀疏但毫米精度的深度测量,这种表示进一步增强。基于回归的算法映射这些异构点云之间的深度位移,使用估计的曲线来改进密集NESO预测。在重建淹没阳极结构时,该方法实现了低至0.41%的相对误差,相对于初始测量值的度量改进高达0.1124 m。亚特兰蒂斯海岸试验台的验证证明了这种多模态融合方法在真实水下条件下获得鲁棒三维表征的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信