Multiview stereo reconstruction based on context-aware transformer

Zhaoxu Tian
{"title":"Multiview stereo reconstruction based on context-aware transformer","authors":"Zhaoxu Tian","doi":"10.1117/12.3032052","DOIUrl":null,"url":null,"abstract":"This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.","PeriodicalId":198425,"journal":{"name":"Other Conferences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Other Conferences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.
基于上下文感知变换器的多视图立体重建
现有的多视图立体(Multi-View Stereo,MVS)方法往往难以处理具有重复纹理和复杂场景的场景,导致重建的质量、完整性和准确性不足。为了解决这些问题,我们引入了一种新型深度学习网络--Clo-PatchmatchNet,它利用上下文感知变换器来提高性能。该网络的架构从处理图像特征的特征提取模块开始。然后将这些特征输入可学习的 Patchmatch 算法,创建初始深度图。该图经过进一步细化,最终生成详细的深度图。我们方法中的一项关键创新是在特征提取阶段集成了一个上下文感知变换器模块(称为 Cloblock)。这使得网络能够有效捕捉全局上下文信息和高频局部细节,从而增强不同视图的特征匹配。我们在丹麦技术大学(DTU)的数据集上进行的实验评估显示,Clo-PatchmatchNet 优于传统的 PatchmatchNet,重建完整性提高了 2.5%,准确性提高了 1.2%,整体提高了 1.7%。此外,与其他当代方法相比,我们提出的解决方案在完整性和整体质量方面都表现出了卓越的性能,标志着三维重建领域的重大进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信