High frequency domain enhancement and channel attention module for multi-view stereo

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2024-11-20 DOI:10.1016/j.compeleceng.2024.109855

Yongjuan Yang , Jie Cao , Hong Zhao , Zhaobin Chang , Weijie Wang

{"title":"High frequency domain enhancement and channel attention module for multi-view stereo","authors":"Yongjuan Yang , Jie Cao , Hong Zhao , Zhaobin Chang , Weijie Wang","doi":"10.1016/j.compeleceng.2024.109855","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109855"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007821","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.

查看原文本刊更多论文

用于多视角立体声的高频域增强和通道关注模块

作为一种三维重建方法，基于深度学习的多视角立体技术越来越受欢迎。现有方法在像素级深度估计方面取得了显著进步。然而，图像中的遮挡和非朗伯表面等挑战阻碍了准确的置信度估计。此外，成本体积正则化往往会导致物体边界过度平滑。为了应对这些挑战，我们建议将高频信息补偿器和三维通道注意模块集成到多视图立体网络中，称为 HFCA-MVS。首先，在特征卷聚合阶段，我们引入了高频信息补偿器模块，以增强二维语义与三维空间之间的相关性。随后，在代价卷正则化阶段，我们引入了三维信道关注模块，通过捕捉不同信道之间的关系来增强信道特征的表示。最后，3DCNN 网络采用 GELU 激活函数来增强激活响应，并减少过度的对象边界平滑。HFCA-MVS 在三个基准数据集的三维重建中表现出了极具竞争力的性能：DTU、BlendMVS 和 Tanks&Temples。特别是在 DTU 基准数据集上，与 CasMVSNet、MVSTER 和 Geo-MVSNet 相比，HFCA-MVS 的完整性分别提高了 33%、6.5% 和 0.4%，总体性能比 CasMVSNet 和 MVSTER 分别提高了 15% 和 4.2%。此外，我们的模型在 Tanks&Temples 数据集上获得了与现有模型相当的重建结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.