Yongjuan Yang , Jie Cao , Hong Zhao , Zhaobin Chang , Weijie Wang
{"title":"用于多视角立体声的高频域增强和通道关注模块","authors":"Yongjuan Yang , Jie Cao , Hong Zhao , Zhaobin Chang , Weijie Wang","doi":"10.1016/j.compeleceng.2024.109855","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109855"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High frequency domain enhancement and channel attention module for multi-view stereo\",\"authors\":\"Yongjuan Yang , Jie Cao , Hong Zhao , Zhaobin Chang , Weijie Wang\",\"doi\":\"10.1016/j.compeleceng.2024.109855\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"121 \",\"pages\":\"Article 109855\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790624007821\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007821","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
High frequency domain enhancement and channel attention module for multi-view stereo
Multi-view stereo based on deep learning is increasingly popular as a method for 3D reconstruction. Existing methods have made significant advancements in pixel-level depth estimation. However, challenges such as occlusions and non-Lambertian surfaces in images hinder accurate confidence estimation. Moreover, cost volume regularization often results in excessive smoothing at object boundaries. To tackle these challenges, we propose integrating the High Frequency Information Compensator and 3D Channel Attention Module into the Multi-View Stereo Network, termed HFCA-MVS. Firstly, in the feature volume aggregation stage, we introduce a high-frequency information compensator module to enhance the correlation between 2D semantics and 3D space. Subsequently, in the cost volume regularization stage, a 3D channel attention module is introduced to enhance the representation of channel features by capturing relationships among different channels. Lastly, the 3DCNN network employs the GELU activation function to boost the activation response and mitigate excessive object boundary smoothing. HFCA-MVS demonstrates competitive performance in 3D reconstruction across three benchmark datasets: DTU, BlendMVS, and Tanks&Temples. Particularly, compared to CasMVSNet, MVSTER, and Geo-MVSNet on the DTU benchmark, HFCA-MVS achieves a relative improvement in completeness of 33%, 6.5%, and 0.4%, respectively, and an enhancement in overall performance of 15% and 4.2% compared to CasMVSNet and MVSTER. Furthermore, our model yields comparable reconstruction results to existing models on the Tanks&Temples dataset.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.