Rongshan Chen, Hao Sheng, Da Yang, Ruixuan Cong, Zhenglong Cui, Sizhe Wang, Wei Ke
{"title":"Multiplane-based Cross-view Interaction Mechanism for Robust Light Field Angular Super-Resolution.","authors":"Rongshan Chen, Hao Sheng, Da Yang, Ruixuan Cong, Zhenglong Cui, Sizhe Wang, Wei Ke","doi":"10.1109/TVCG.2025.3564643","DOIUrl":null,"url":null,"abstract":"<p><p>Dense sampling of the light field (LF) is essential for various applications, such as virtual reality. However, the collection process is prohibitively expensive due to technological limitations in imaging. Synthesizing novel views from sparse LF data, known as LF Angular Super-Resolution (LFASR), offers an effective solution to this problem. Accurate cross-view interaction is crucial for this task, given the complementary information between LF views. Previous methods, however, suffer from limited reconstruction quality due to inefficient view interaction. To address this, we propose a Multiplane-based Cross-view Interaction Mechanism (MCIM) for robust LFASR. Extensive comparisons with state-of-the-art methods demonstrate that our method achieves superior performance, both visually and quantitatively. Specifically, Drawing inspiration from MultiPlane Images (MPI) in scene modeling, our mechanism incorporates a novel Multiplane Feature Fusion (MPFF) strategy. This strategy facilitates fast and accurate cross-view interaction, enhancing the network's robustness to scene geometry and suitability for different-baseline LF scenes. Furthermore, to address information redundancy in multiplanes, we leverage the transparency property of MPI and devise a plane selection strategy. Finally, we propose CSTNet, a Cross-Shaped Transformer-based network for LFASR, which employs a cross-shaped self-attention mechanism to enable low-cost training and inference. Experimental results on various angular super-resolution tasks validate that our network achieves state-of-the-art performance on both synthetic and real-world LF scenes.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3564643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dense sampling of the light field (LF) is essential for various applications, such as virtual reality. However, the collection process is prohibitively expensive due to technological limitations in imaging. Synthesizing novel views from sparse LF data, known as LF Angular Super-Resolution (LFASR), offers an effective solution to this problem. Accurate cross-view interaction is crucial for this task, given the complementary information between LF views. Previous methods, however, suffer from limited reconstruction quality due to inefficient view interaction. To address this, we propose a Multiplane-based Cross-view Interaction Mechanism (MCIM) for robust LFASR. Extensive comparisons with state-of-the-art methods demonstrate that our method achieves superior performance, both visually and quantitatively. Specifically, Drawing inspiration from MultiPlane Images (MPI) in scene modeling, our mechanism incorporates a novel Multiplane Feature Fusion (MPFF) strategy. This strategy facilitates fast and accurate cross-view interaction, enhancing the network's robustness to scene geometry and suitability for different-baseline LF scenes. Furthermore, to address information redundancy in multiplanes, we leverage the transparency property of MPI and devise a plane selection strategy. Finally, we propose CSTNet, a Cross-Shaped Transformer-based network for LFASR, which employs a cross-shaped self-attention mechanism to enable low-cost training and inference. Experimental results on various angular super-resolution tasks validate that our network achieves state-of-the-art performance on both synthetic and real-world LF scenes.