MAIR++: Improving Multi-View Attention Inverse Rendering With Implicit Lighting Representation

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-03-06 DOI:10.1109/TPAMI.2025.3548679

JunYong Choi;SeokYeong Lee;Haesol Park;Seung-Won Jung;Ig-Jae Kim;Junghyun Cho

{"title":"MAIR++: Improving Multi-View Attention Inverse Rendering With Implicit Lighting Representation","authors":"JunYong Choi;SeokYeong Lee;Haesol Park;Seung-Won Jung;Ig-Jae Kim;Junghyun Cho","doi":"10.1109/TPAMI.2025.3548679","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only outperforms MAIR and single-view-based methods but also demonstrates robust performance on unseen real-world scenes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 6","pages":"5076-5093"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10916587/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only outperforms MAIR and single-view-based methods but also demonstrates robust performance on unseen real-world scenes.

查看原文本刊更多论文

mair++：使用隐式光照表示改进多视图注意力逆渲染

在本文中，我们提出了一个场景级反向渲染框架，该框架使用多视图图像将场景分解为几何，SVBRDF和3D空间变化照明。虽然多视图图像已被广泛用于对象级逆向渲染，但由于缺乏包含具有地面真实几何、材料和空间变化照明的高动态范围多视图图像的数据集，因此主要使用单视图图像研究场景级逆向渲染。为了提高场景级反绘制的质量，最近提出了一种新的多视图注意力反绘制框架（MAIR）。MAIR通过扩展OpenRooms数据集，设计有效的管道来处理多视图图像，以及分割空间变化的照明来执行场景级多视图反向渲染。虽然MAIR显示了令人印象深刻的结果，但它的照明表示是固定的球形高斯，这限制了它渲染图像的真实性。因此，MAIR不能直接用于诸如材料编辑之类的应用程序。此外，其多视图聚合网络仅关注多视图特征之间的均值和方差，难以提取丰富的特征。在本文中，我们提出了它的扩展版本，称为mair++。mair++通过引入隐式照明表示来解决上述限制，该表示可以准确捕获图像的照明条件，同时促进逼真的渲染。此外，我们设计了一个基于定向注意力的多视图聚合网络，以推断视图之间更复杂的关系。实验结果表明，mair++不仅优于MAIR方法和基于单视图的方法，而且在未见过的真实场景中也表现出稳健的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量