Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan
{"title":"用于光场图像超分辨率的快速傅立叶卷积混合注意力转换器","authors":"Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan","doi":"10.1016/j.imavis.2025.105542","DOIUrl":null,"url":null,"abstract":"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105542"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution\",\"authors\":\"Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan\",\"doi\":\"10.1016/j.imavis.2025.105542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"158 \",\"pages\":\"Article 105542\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001301\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001301","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution
The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.