Novel view synthesis with wide-baseline stereo pairs based on local–global information

IF 2.8 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Kai Song, Lei Zhang
{"title":"Novel view synthesis with wide-baseline stereo pairs based on local–global information","authors":"Kai Song,&nbsp;Lei Zhang","doi":"10.1016/j.cag.2024.104139","DOIUrl":null,"url":null,"abstract":"<div><div>Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104139"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849324002747","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.

Abstract Image

基于局部-全局信息的宽基线立体对视图合成方法
新视图合成利用已知视图中的多个场景图像从新视图生成图像。使用宽基线立体图像对进行新颖的视图合成,可以仅用两张图像从不同的角度渲染场景,大大降低了图像采集和存储成本,提高了3D场景重建效率。然而,一对宽基线立体图像之间存在较大的几何差异和严重的遮挡,往往会在新视图图像中产生伪影和孔洞。为了解决这些问题,我们提出了一种集成局部和全局信息的方法,用于从宽基线立体图像对合成新的视图图像。首先,我们的方法使用卷积神经网络(CNN)将成本量与局部信息聚合,并使用Transformer捕获全局特征。该过程优化了视差预测,提高了宽基线立体图像对三维场景表示的深度预测和重建质量。随后,我们的方法使用CNN捕获局部语义信息,并使用Transformer对远程上下文依赖关系进行建模,生成高质量的新视图图像。大量实验表明,该方法可以有效地减少伪影和孔洞,从而提高宽基线立体图像对新视图的合成质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Graphics-Uk
Computers & Graphics-Uk 工程技术-计算机:软件工程
CiteScore
5.30
自引率
12.00%
发文量
173
审稿时长
38 days
期刊介绍: Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信