An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range

ACM Transactions on Graphics (TOG) Pub Date : 2023-12-04 DOI:10.1145/3618367

Chao Wang, Ana Serrano, Xingang Pan, Krzysztof Wolski, Bin Chen, K. Myszkowski, Hans-Peter Seidel, Christian Theobalt, Thomas Leimkühler

{"title":"An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range","authors":"Chao Wang, Ana Serrano, Xingang Pan, Krzysztof Wolski, Bin Chen, K. Myszkowski, Hans-Peter Seidel, Christian Theobalt, Thomas Leimkühler","doi":"10.1145/3618367","DOIUrl":null,"url":null,"abstract":"In everyday photography, physical limitations of camera sensors and lenses frequently lead to a variety of degradations in captured images such as saturation or defocus blur. A common approach to overcome these limitations is to resort to image stack fusion, which involves capturing multiple images with different focal distances or exposures. For instance, to obtain an all-in-focus image, a set of multi-focus images is captured. Similarly, capturing multiple exposures allows for the reconstruction of high dynamic range. In this paper, we present a novel approach that combines neural fields with an expressive camera model to achieve a unified reconstruction of an all-in-focus high-dynamic-range image from an image stack. Our approach is composed of a set of specialized implicit neural representations tailored to address specific sub-problems along our pipeline: We use neural implicits to predict flow to overcome misalignments arising from lens breathing, depth, and all-in-focus images to account for depth of field, as well as tonemapping to deal with sensor responses and saturation - all trained using a physically inspired supervision structure with a differentiable thin lens model at its core. An important benefit of our approach is its ability to handle these tasks simultaneously or independently, providing flexible post-editing capabilities such as refocusing and exposure adjustment. By sampling the three primary factors in photography within our framework (focal distance, aperture, and exposure time), we conduct a thorough exploration to gain valuable insights into their significance and impact on overall reconstruction quality. Through extensive validation, we demonstrate that our method outperforms existing approaches in both depth-from-defocus and all-in-focus image reconstruction tasks. Moreover, our approach exhibits promising results in each of these three dimensions, showcasing its potential to enhance captured image quality and provide greater control in post-processing.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"35 48","pages":"1 - 11"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In everyday photography, physical limitations of camera sensors and lenses frequently lead to a variety of degradations in captured images such as saturation or defocus blur. A common approach to overcome these limitations is to resort to image stack fusion, which involves capturing multiple images with different focal distances or exposures. For instance, to obtain an all-in-focus image, a set of multi-focus images is captured. Similarly, capturing multiple exposures allows for the reconstruction of high dynamic range. In this paper, we present a novel approach that combines neural fields with an expressive camera model to achieve a unified reconstruction of an all-in-focus high-dynamic-range image from an image stack. Our approach is composed of a set of specialized implicit neural representations tailored to address specific sub-problems along our pipeline: We use neural implicits to predict flow to overcome misalignments arising from lens breathing, depth, and all-in-focus images to account for depth of field, as well as tonemapping to deal with sensor responses and saturation - all trained using a physically inspired supervision structure with a differentiable thin lens model at its core. An important benefit of our approach is its ability to handle these tasks simultaneously or independently, providing flexible post-editing capabilities such as refocusing and exposure adjustment. By sampling the three primary factors in photography within our framework (focal distance, aperture, and exposure time), we conduct a thorough exploration to gain valuable insights into their significance and impact on overall reconstruction quality. Through extensive validation, we demonstrate that our method outperforms existing approaches in both depth-from-defocus and all-in-focus image reconstruction tasks. Moreover, our approach exhibits promising results in each of these three dimensions, showcasing its potential to enhance captured image quality and provide greater control in post-processing.

查看原文本刊更多论文

图像堆栈的内隐神经表征：深度、全焦和高动态范围

在日常摄影中，相机传感器和镜头的物理限制经常导致捕获图像的各种退化，例如饱和度或散焦模糊。克服这些限制的一种常用方法是求助于图像堆栈融合，这涉及捕获具有不同焦距或曝光的多个图像。例如，为了获得全焦图像，需要捕获一组多焦图像。同样，捕捉多次曝光可以重建高动态范围。在本文中，我们提出了一种将神经场与表达相机模型相结合的新方法，以实现从图像堆栈中统一重建全焦高动态范围图像。我们的方法由一组专门的隐式神经表示组成，用于解决我们管道中的特定子问题:我们使用神经隐式来预测流量，以克服由镜头呼吸、深度和全聚焦图像引起的不对准，以解释景深，以及色调映射来处理传感器响应和饱和度——所有这些都使用物理启发的监督结构进行训练，其核心是可微薄透镜模型。我们的方法的一个重要好处是它能够同时或独立地处理这些任务，提供灵活的后期编辑功能，如重新对焦和曝光调整。通过在我们的框架内采样摄影中的三个主要因素(焦距、光圈和曝光时间)，我们进行了深入的探索，以获得对整体重建质量的重要性和影响的宝贵见解。通过广泛的验证，我们证明我们的方法在离焦深度和全焦图像重建任务中都优于现有的方法。此外，我们的方法在这三个方面都显示出有希望的结果，展示了其提高捕获图像质量和在后处理中提供更好控制的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Graphics (TOG)

自引率

0.00%

发文量