{"title":"Multimodal information fusion using pyramidal attention-based convolutions for underwater tri-dimensional scene reconstruction","authors":"Pedro Nuno Leite , Andry Maykol Pinto","doi":"10.1016/j.inffus.2025.103339","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124<!--> <!-->m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103339"},"PeriodicalIF":15.5000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004129","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Underwater environments pose unique challenges to optical systems due to physical phenomena that induce severe data degradation. Current imaging sensors rarely address these effects comprehensively, resulting in the need to integrate complementary information sources. This article presents a multimodal data fusion approach to combine information from diverse sensing modalities into a single dense and accurate tri-dimensional representation. The proposed fusiNg tExture with apparent motion information for underwater Scene recOnstruction (NESO) encoder–decoder network leverages motion perception principles to extract relative depth cues, fusing them with textured information through an early fusion strategy. Evaluated on the FLSea-Stereo dataset, NESO outperforms state-of-the-art methods by 58.7%. Dense depth maps are achieved using multi-stage skip connections with attention mechanisms that ensure propagation of key features across network levels. This representation is further enhanced by incorporating sparse but millimeter-precise depth measurements from active imaging techniques. A regression-based algorithm maps depth displacements between these heterogeneous point clouds, using the estimated curves to refine the dense NESO prediction. This approach achieves relative errors as low as 0.41% when reconstructing submerged anode structures, accounting for metric improvements of up to 0.1124 m relative to the initial measurements. Validation at the ATLANTIS Coastal Testbed demonstrates the effectiveness of this multimodal fusion approach in obtaining robust tri-dimensional representations in real underwater conditions.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.