Multiview stereo reconstruction based on context-aware transformer

Other Conferences Pub Date : 2024-06-06 DOI:10.1117/12.3032052

Zhaoxu Tian

{"title":"Multiview stereo reconstruction based on context-aware transformer","authors":"Zhaoxu Tian","doi":"10.1117/12.3032052","DOIUrl":null,"url":null,"abstract":"This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.","PeriodicalId":198425,"journal":{"name":"Other Conferences","volume":"128 5","pages":"131750U - 131750U-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Other Conferences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper tackles the challenges inherent in existing Multi-View Stereo (MVS) methods, which often struggle with scenes that have repetitive textures and complex scenarios, leading to reconstructions that lack quality, completeness, and accuracy. To address these issues, we introduce a novel deep learning network, Clo-PatchmatchNet, which leverages context-aware Transformers for enhanced performance. The network's architecture starts with a feature extraction module that processes image features. These features are then input into a learnable Patchmatch algorithm, creating an initial depth map. This map undergoes further refinement to yield the final, detailed depth map. A key innovation in our approach is the integration of a context-aware Transformer block, known as Cloblock, into the feature extraction stage. This allows the network to effectively capture both global contextual information and high-frequency local details, enhancing feature matching across various views. Our experimental evaluations, conducted on the Technical University of Denmark (DTU) dataset, reveal that Clo-PatchmatchNet outperforms the traditional PatchmatchNet by achieving a 2.5% improvement in reconstruction completeness and a 1.2% increase in accuracy, resulting in an overall enhancement of 1.7%. Moreover, when compared to other contemporary methods, our proposed solution demonstrates superior performance in terms of both completeness and overall quality, marking a significant advancement in the field of 3D reconstruction.

查看原文本刊更多论文

基于上下文感知变换器的多视图立体重建

现有的多视图立体（Multi-View Stereo，MVS）方法往往难以处理具有重复纹理和复杂场景的场景，导致重建的质量、完整性和准确性不足。为了解决这些问题，我们引入了一种新型深度学习网络--Clo-PatchmatchNet，它利用上下文感知变换器来提高性能。该网络的架构从处理图像特征的特征提取模块开始。然后将这些特征输入可学习的 Patchmatch 算法，创建初始深度图。该图经过进一步细化，最终生成详细的深度图。我们方法中的一项关键创新是在特征提取阶段集成了一个上下文感知变换器模块（称为 Cloblock）。这使得网络能够有效捕捉全局上下文信息和高频局部细节，从而增强不同视图的特征匹配。我们在丹麦技术大学（DTU）的数据集上进行的实验评估显示，Clo-PatchmatchNet 优于传统的 PatchmatchNet，重建完整性提高了 2.5%，准确性提高了 1.2%，整体提高了 1.7%。此外，与其他当代方法相比，我们提出的解决方案在完整性和整体质量方面都表现出了卓越的性能，标志着三维重建领域的重大进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Other Conferences

自引率

0.00%

发文量