Da Yang;Hao Sheng;Sizhe Wang;Shuai Wang;Zhang Xiong;Wei Ke
{"title":"通过遮蔽光场建模提高光场空间超分辨率","authors":"Da Yang;Hao Sheng;Sizhe Wang;Shuai Wang;Zhang Xiong;Wei Ke","doi":"10.1109/TCI.2024.3451998","DOIUrl":null,"url":null,"abstract":"Light field (LF) imaging benefits a wide range of applications with geometry information it captured. However, due to the restricted sensor resolution, LF cameras sacrifice spatial resolution for sufficient angular resolution. Hence LF spatial super-resolution (LFSSR), which highly relies on inter-intra view correlation extraction, is widely studied. In this paper, a self-supervised pre-training scheme, named masked LF modeling (MLFM), is proposed to boost the learning of inter-intra view correlation for better super-resolution performance. To achieve this, we first introduce a transformer structure, termed as LFormer, to establish direct inter-view correlations inside the 4D LF. Compared with traditional disentangling operations for LF feature extraction, LFormer avoids unnecessary loss in angular domain. Therefore it performs better in learning the cross-view mapping among pixels with MLFM pre-training. Then by cascading LFormers as encoder, LFSSR network LFormer-Net is designed, which comprehensively performs inter-intra view high-frequency information extraction. In the end, LFormer-Net is pre-trained with MLFM by introducing a Spatially-Random Angularly-Consistent Masking (SRACM) module. With a high masking ratio, MLFM pre-training effectively promotes the performance of LFormer-Net. Extensive experiments on public datasets demonstrate the effectiveness of MLFM pre-training and LFormer-Net. Our approach outperforms state-of-the-art LFSSR methods numerically and visually on both small- and large-disparity datasets.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1317-1330"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting Light Field Spatial Super-Resolution via Masked Light Field Modeling\",\"authors\":\"Da Yang;Hao Sheng;Sizhe Wang;Shuai Wang;Zhang Xiong;Wei Ke\",\"doi\":\"10.1109/TCI.2024.3451998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Light field (LF) imaging benefits a wide range of applications with geometry information it captured. However, due to the restricted sensor resolution, LF cameras sacrifice spatial resolution for sufficient angular resolution. Hence LF spatial super-resolution (LFSSR), which highly relies on inter-intra view correlation extraction, is widely studied. In this paper, a self-supervised pre-training scheme, named masked LF modeling (MLFM), is proposed to boost the learning of inter-intra view correlation for better super-resolution performance. To achieve this, we first introduce a transformer structure, termed as LFormer, to establish direct inter-view correlations inside the 4D LF. Compared with traditional disentangling operations for LF feature extraction, LFormer avoids unnecessary loss in angular domain. Therefore it performs better in learning the cross-view mapping among pixels with MLFM pre-training. Then by cascading LFormers as encoder, LFSSR network LFormer-Net is designed, which comprehensively performs inter-intra view high-frequency information extraction. In the end, LFormer-Net is pre-trained with MLFM by introducing a Spatially-Random Angularly-Consistent Masking (SRACM) module. With a high masking ratio, MLFM pre-training effectively promotes the performance of LFormer-Net. Extensive experiments on public datasets demonstrate the effectiveness of MLFM pre-training and LFormer-Net. Our approach outperforms state-of-the-art LFSSR methods numerically and visually on both small- and large-disparity datasets.\",\"PeriodicalId\":56022,\"journal\":{\"name\":\"IEEE Transactions on Computational Imaging\",\"volume\":\"10 \",\"pages\":\"1317-1330\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10659219/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10659219/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Boosting Light Field Spatial Super-Resolution via Masked Light Field Modeling
Light field (LF) imaging benefits a wide range of applications with geometry information it captured. However, due to the restricted sensor resolution, LF cameras sacrifice spatial resolution for sufficient angular resolution. Hence LF spatial super-resolution (LFSSR), which highly relies on inter-intra view correlation extraction, is widely studied. In this paper, a self-supervised pre-training scheme, named masked LF modeling (MLFM), is proposed to boost the learning of inter-intra view correlation for better super-resolution performance. To achieve this, we first introduce a transformer structure, termed as LFormer, to establish direct inter-view correlations inside the 4D LF. Compared with traditional disentangling operations for LF feature extraction, LFormer avoids unnecessary loss in angular domain. Therefore it performs better in learning the cross-view mapping among pixels with MLFM pre-training. Then by cascading LFormers as encoder, LFSSR network LFormer-Net is designed, which comprehensively performs inter-intra view high-frequency information extraction. In the end, LFormer-Net is pre-trained with MLFM by introducing a Spatially-Random Angularly-Consistent Masking (SRACM) module. With a high masking ratio, MLFM pre-training effectively promotes the performance of LFormer-Net. Extensive experiments on public datasets demonstrate the effectiveness of MLFM pre-training and LFormer-Net. Our approach outperforms state-of-the-art LFSSR methods numerically and visually on both small- and large-disparity datasets.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.