Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention.

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-05-23 DOI:10.1109/TVCG.2025.3572341

Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Wenhan Luo, Wenping Wang, Yike Guo

{"title":"Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention.","authors":"Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Wenhan Luo, Wenping Wang, Yike Guo","doi":"10.1109/TVCG.2025.3572341","DOIUrl":null,"url":null,"abstract":"<p><p>Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the triplane tokens. Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence. Project page: https://murphylmf.github.io/M-LRM/.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3572341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the strong 3D coherence among the input images. In this paper, we propose a Multi-view Large Reconstruction Model (M-LRM) designed to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner. Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images. Moreover, we employ the 3D priors of the input multi-view images to initialize the triplane tokens. Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity. Experimental studies demonstrate that our model achieves a significant performance gain and faster training convergence. Project page: https://murphylmf.github.io/M-LRM/.

查看原文本刊更多论文

基于几何感知位置编码和注意的多视图大重构模型。

尽管大型重建模型（Large Reconstruction Model， LRM）的最新进展显示出令人印象深刻的结果，但当将其输入从单个图像扩展到多个图像时，它表现出效率低下，几何和纹理质量低于标准，收敛速度也低于预期。因此，LRM将三维重建表述为一个幼稚的图像到三维的平移问题，忽略了输入图像之间的强三维相干性。在本文中，我们提出了一种多视图大型重建模型（M-LRM），旨在以3D感知的方式从多视图重建高质量的3D形状。具体来说，我们引入了一种多视图一致交叉注意方案，使M-LRM能够准确地从输入图像中查询信息。此外，我们使用输入的多视图图像的3D先验来初始化三平面标记。与以往的方法相比，本文提出的M-LRM可以生成高保真度的三维形状。实验研究表明，我们的模型取得了显著的性能提升和更快的训练收敛速度。项目页面：https://murphylmf.github.io/M-LRM/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量