用于光场图像超分辨率的快速傅立叶卷积混合注意力转换器

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-04-10 DOI:10.1016/j.imavis.2025.105542

Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan

{"title":"用于光场图像超分辨率的快速傅立叶卷积混合注意力转换器","authors":"Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan","doi":"10.1016/j.imavis.2025.105542","DOIUrl":null,"url":null,"abstract":"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105542"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution\",\"authors\":\"Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan\",\"doi\":\"10.1016/j.imavis.2025.105542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"158 \",\"pages\":\"Article 105542\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001301\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001301","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

光场（LF）照相机的空间分辨率有限，阻碍了其广泛应用，因此迫切需要超分辨率技术来改善其实际应用。基于变压器的方法（如 LF-DET）已显示出提高光场空间超分辨率（LF-SR）的潜力。然而，LF-DET 采用空间-矩形可分离变压器编码器，并通过子采样空间和多尺度角度建模实现全局上下文交互，却难以有效捕捉早期层的全局上下文和局部细节。在这项工作中，我们引入了 LF-HATF，这是一种基于 LF-DET 框架并结合了快速傅立叶卷积（FFC）和混合注意力变换器（HAT）的新型网络，以解决这些局限性。这种整合使 LF-HATF 能够更好地捕捉全局和局部信息，显著改善边缘细节和纹理的还原，并提供对复杂场景更全面的理解。此外，我们还提出了光场 Charbonnier 损失函数，旨在平衡不同 LF 视图之间的差异分布。该函数最大限度地减少了同一视角和不同视角之间的误差，进一步提高了模型的性能。我们在五个公共 LF 数据集上进行的评估表明，LF-HATF 的性能优于现有方法，代表了 LF-SR 技术的重大进步。这一进步推动了光场成像领域的发展，为光场成像研究开辟了新的途径，释放了光场相机的全部潜能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution

查看原文本刊更多论文

Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution

The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.