Novel view synthesis with wide-baseline stereo pairs based on local–global information

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-02-01 DOI:10.1016/j.cag.2024.104139

Kai Song, Lei Zhang

{"title":"Novel view synthesis with wide-baseline stereo pairs based on local–global information","authors":"Kai Song, Lei Zhang","doi":"10.1016/j.cag.2024.104139","DOIUrl":null,"url":null,"abstract":"<div><div>Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104139"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849324002747","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.

Abstract Image

查看原文本刊更多论文

基于局部-全局信息的宽基线立体对视图合成方法

新视图合成利用已知视图中的多个场景图像从新视图生成图像。使用宽基线立体图像对进行新颖的视图合成，可以仅用两张图像从不同的角度渲染场景，大大降低了图像采集和存储成本，提高了3D场景重建效率。然而，一对宽基线立体图像之间存在较大的几何差异和严重的遮挡，往往会在新视图图像中产生伪影和孔洞。为了解决这些问题，我们提出了一种集成局部和全局信息的方法，用于从宽基线立体图像对合成新的视图图像。首先，我们的方法使用卷积神经网络（CNN）将成本量与局部信息聚合，并使用Transformer捕获全局特征。该过程优化了视差预测，提高了宽基线立体图像对三维场景表示的深度预测和重建质量。随后，我们的方法使用CNN捕获局部语义信息，并使用Transformer对远程上下文依赖关系进行建模，生成高质量的新视图图像。大量实验表明，该方法可以有效地减少伪影和孔洞，从而提高宽基线立体图像对新视图的合成质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.