Qi Jiang;Hao Shi;Shaohua Gao;Jiaming Zhang;Kailun Yang;Lei Sun;Huajian Ni;Kaiwei Wang
{"title":"Computational Imaging for Machine Perception: Transferring Semantic Segmentation Beyond Aberrations","authors":"Qi Jiang;Hao Shi;Shaohua Gao;Jiaming Zhang;Kailun Yang;Lei Sun;Huajian Ni;Kaiwei Wang","doi":"10.1109/TCI.2024.3380363","DOIUrl":"https://doi.org/10.1109/TCI.2024.3380363","url":null,"abstract":"Semantic scene understanding with Minimalist Optical Systems (MOS) in mobile and wearable applications remains a challenge due to the corrupted imaging quality induced by optical aberrations. However, previous works only focus on improving the subjective imaging quality through the Computational Imaging (CI) technique, ignoring the feasibility of advancing semantic segmentation. In this paper, we pioneer the investigation of Semantic Segmentation under Optical Aberrations (SSOA) with MOS. To benchmark SSOA, we construct \u0000<italic>Virtual Prototype Lens (VPL)</i>\u0000 groups through optical simulation, generating \u0000<italic>Cityscapes-ab</i>\u0000 and \u0000<italic>KITTI-360-ab</i>\u0000 datasets under different behaviors and levels of aberrations. We look into SSOA via an unsupervised domain adaptation perspective to address the scarcity of labeled aberration data in real-world scenarios. Further, we propose \u0000<italic>Computational Imaging Assisted Domain Adaptation (CIADA)</i>\u0000 to leverage prior knowledge of CI for robust performance in SSOA. Based on our benchmark, we conduct experiments on the robustness of classical segmenters against aberrations. In addition, extensive evaluations of possible solutions to SSOA reveal that CIADA achieves superior performance under all aberration distributions, bridging the gap between computational imaging and downstream applications for MOS.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"535-548"},"PeriodicalIF":5.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140533399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentiable Deflectometric Eye Tracking","authors":"Tianfu Wang;Jiazhang Wang;Nathan Matsuda;Oliver Cossairt;Florian Willomitzer","doi":"10.1109/TCI.2024.3382494","DOIUrl":"10.1109/TCI.2024.3382494","url":null,"abstract":"Eye tracking is an important tool in many scientific and commercial domains. State-of-the-art eye tracking methods are either reflection-based and track reflections of sparse point light sources, or image-based and exploit 2D features of the acquired eye image. In this work, we attempt to significantly improve reflection-based methods by utilizing pixel-dense deflectometric surface measurements in combination with optimization-based inverse rendering algorithms. Utilizing the known geometry of our deflectometric setup, we develop a differentiable rendering pipeline based on PyTorch3D that simulates a virtual eye under screen illumination. Eventually, we exploit the image-screen-correspondence information from the captured measurements to find the eye's \u0000<italic>rotation</i>\u0000, \u0000<italic>translation</i>\u0000, and \u0000<italic>shape</i>\u0000 parameters with our renderer via gradient descent. We demonstrate real-world experiments with evaluated mean relative gaze errors below \u0000<inline-formula><tex-math>$0.45 ^{circ }$</tex-math></inline-formula>\u0000 at a precision better than \u0000<inline-formula><tex-math>$0.11 ^{circ }$</tex-math></inline-formula>\u0000. Moreover, we show an improvement of 6X over a representative reflection-based state-of-the-art method in simulation. In addition, we demonstrate a special variant of our method that does not require a specific pattern and can work with arbitrary image or video content from every screen (e.g., in a VR headset).","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"888-898"},"PeriodicalIF":5.4,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziqi Bai;Xianming Liu;Cheng Guo;Kui Jiang;Junjun Jiang;Xiangyang Ji
{"title":"HoloFormer: Contrastive Regularization Based Transformer for Holographic Image Reconstruction","authors":"Ziqi Bai;Xianming Liu;Cheng Guo;Kui Jiang;Junjun Jiang;Xiangyang Ji","doi":"10.1109/TCI.2024.3384809","DOIUrl":"https://doi.org/10.1109/TCI.2024.3384809","url":null,"abstract":"Deep learning has emerged as a prominent technique in the field of holographic imaging, owing to its rapidity and high performance. Prevailing deep neural networks employed for holographic image reconstruction predominantly rely on convolutional neural networks (CNNs). While CNNs have yielded impressive results, their intrinsic limitations, characterized by a constrained local receptive field and uniform representation, pose challenges in harnessing spatial texture similarities inherent in holographic images. To address this issue, we propose a novel hierarchical framework based on self-attention mechanism for digital holographic reconstruction, termed HoloFormer. Specifically, we adopt a window-based transformer block as the backbone, significantly reducing computational costs. In the encoder, a pyramid-like hierarchical structure enables the learning of feature map representations at different scales. In the decoder, a dual-branch design ensures that the real and imaginary parts of the complex amplitude do not exhibit cross-talk with each other. During the training phase, we incorporate contrastive regularization to maximize the utilization of mutual information. Overall, our experiments demonstrate that HoloFormer achieves superior reconstruction results compared to previous CNN-based architectures. This progress further propels the development of deep learning-based holographic imaging, particularly in lensless microscopy applications.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"560-573"},"PeriodicalIF":5.4,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140546566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shu Li;Yi Liu;Rongbiao Yan;Haowen Zhang;Shubin Wang;Ting Ding;Zhiguo Gui
{"title":"DD-DCSR: Image Denoising for Low-Dose CT via Dual-Dictionary Deep Convolutional Sparse Representation","authors":"Shu Li;Yi Liu;Rongbiao Yan;Haowen Zhang;Shubin Wang;Ting Ding;Zhiguo Gui","doi":"10.1109/TCI.2024.3408091","DOIUrl":"10.1109/TCI.2024.3408091","url":null,"abstract":"Most of the existing low-dose computed tomography (LDCT) denoising algorithms, based on convolutional neural networks, are not interpretable enough due to a lack of mathematical basis. In the process of image denoising, the sparse representation based on a single dictionary cannot restore the texture details of the image perfectly. To solve these problems, we propose a Dual-Dictionary Convolutional Sparse Representation (DD-CSR) method and construct a Dual-Dictionary Deep Convolutional Sparse Representation network (DD-DCSR) to unfold the model iteratively. The modules in the network correspond to the model one by one. In the proposed DD-CSR, the high-frequency information is extracted by Local Total Variation (LTV), and then two different learnable convolutional dictionaries are used to sparsely represent the LDCT image and its high-frequency map. To improve the robustness of the model, the adaptive coefficient is introduced into the convolutional dictionary of LDCT images, which allows the image to be represented by fewer convolutional dictionary atoms and reduces the number of parameters of the model. Considering that the sparse degree of convolutional sparse feature maps is closely related to noise, the model introduces learnable weight coefficients into the penalty items of processing LDCT high-frequency maps. The experimental results show that the interpretable DD-DCSR network can well restore the texture details of the image when removing noise/artifacts.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"899-914"},"PeriodicalIF":5.4,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MAPANet: A Multi-Scale Attention-Guided Progressive Aggregation Network for Multi-Contrast MRI Super-Resolution","authors":"Licheng Liu;Tao Liu;Wei Zhou;Yaonan Wang;Min Liu","doi":"10.1109/TCI.2024.3393723","DOIUrl":"10.1109/TCI.2024.3393723","url":null,"abstract":"Multi-contrast magnetic resonance imaging (MRI) super-resolution (SR), which utilizes complementary information from different contrast images to reconstruct the target images, can provide rich information for quantitative image analysis and accurate medical diagnosis. However, the current mainstream methods are failed in exploiting multi-scale features or global information for data representation, leading to poor outcomes. To address these limitations, we propose a multi-scale attention-guided progressive aggregation network (MAPANet) to progressively restore the target contrast MR images from the corresponding low resolution (LR) observations with the assistance of auxiliary contrast images. Specifically, the proposed MAPANet is composed of several stacked dual-branch aggregation (DBA) blocks, each of which consists of two parallel modules: the multi-scale attention module (MSAM) and the reference feature extraction module (RFEM). The former aims to utilize multi-scale and appropriate non-local information to facilitate the SR reconstruction, while the latter is designed to extract the complementary information from auxiliary contrast images to assist in restoring edge structures and details for target contrast images. Extensive experiments on the public datasets demonstrate that the proposed MAPANet outperforms several state-of-the-art multi-contrast SR methods.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"928-940"},"PeriodicalIF":5.4,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Pixel-Wise Registration Learning for Robust Fusion-Based Hyperspectral Image Super-Resolution","authors":"Jiangtao Nie;Wei Wei;Lei Zhang;Chen Ding;Yanning Zhang","doi":"10.1109/TCI.2024.3408095","DOIUrl":"10.1109/TCI.2024.3408095","url":null,"abstract":"Hyperspectral image (HSI) super-resolution (SR) aims to generate a high resolution (HR) HSI in both spectral and spatial domains, in which the fusion-based SR methods have shown great potential in producing a pleasing HR HSI by taking both advantages of the observed low-resolution (LR) HSI and HR multispectral image (MSI). Most existing fusion-based methods implicitly assume that the observed LR HSI and HR MSI are exactly registered, which is, however, difficult to comply with in practice and thus impedes their generalization performance in real applications. To mitigate this problem, we propose a hybrid pixel-wise registration learning framework for fusion-based HSI SR, which shows two aspects of advantages. On the one hand, a pixel-wise registration module (PRM) is developed to directly estimate the transformed coordinate of each pixel, which enables coping with various complex (e.g., rigid or nonrigid) misalignment between two observed images and is pluggable to any other existing architectures. On the other hand, a hybrid learning scheme is conducted to jointly learn both the PRM and the deep image prior-based SR network. Through compositing supervised and unsupervised learning in a two-stage manner, the proposed method is able to exploit both the image-agnostic and image-specific characteristics to robustly cope with unknown misalignment and thus improve the generalization capacity. Experimental results on four benchmark datasets show the superior performance of the proposed method in handling fusion-based HSI SR with various unknown misalignments.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"915-927"},"PeriodicalIF":5.4,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Interactive Enhanced for Defocus Blur Estimation","authors":"Huaguang Li;Wenhua Qian;Jinde Cao;Rencan Nie;Peng Liu;Dan Xu","doi":"10.1109/TCI.2024.3354427","DOIUrl":"10.1109/TCI.2024.3354427","url":null,"abstract":"Defocus blur estimation requires high-precision detection between the homogeneous region and transition edge. This paper develops a novel progressive design that effectively addresses this challenge. Our multi-interactive scheme could gradually learn the characteristics of degraded input and divide complex defocus blur estimation into more manageable subnetworks. Specifically, we equally degrade the source inputs and combine them with complementary information subnetworks. In the first two stages, feature interactive modules are introduced to achieve the purpose of information interaction between different features. One challenge in multi-stage networks is transmitting information features between stages, which led to the development of the supervision-guided attention module. Taking into consideration the intricacies associated with neural network design and the pronounced affinity of defocus and focus characteristics with global semantic information, in the final stage, we opt to directly input the original image, after significant affinity-based feature weighting, into the network. This strategic incorporation of global semantic information serves to mitigate the challenges posed by feature concatenation artifacts and noise encountered in the preceding two stages, thereby bolstering the accuracy of the model.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"640-652"},"PeriodicalIF":5.4,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adapting the Learning Models of Single Image Super-Resolution Into Light-Field Imaging","authors":"Aupendu Kar;Suresh Nehra;Jayanta Mukherjee;Prabir Kumar Biswas","doi":"10.1109/TCI.2024.3380348","DOIUrl":"10.1109/TCI.2024.3380348","url":null,"abstract":"The emergence of Light Field (LF) cameras has made LF imaging a popular technology in computational photography. However, the spatial resolution of these micro-lens-based LF cameras is limited due to the combination of spatial and angular information, which is the primary obstacle for other applications of LF cameras. To explore the potential of LF imaging, Light-Field Super-Resolution (LFSR) algorithms have been developed to exploit the spatial and angular information present in LF imaging. In this paper, we propose an alternative approach to achieve LFSR using pre-trained Single Image Super-Resolution (SISR) models. We introduce an LF domain-specific adaptation module that can be included in any SISR model to make it suitable for the LF domain. We experimentally demonstrate that three different kinds of SISR models, namely the bicubic degradation handling model, the blur kernel based model, and adversarially trained SISR models for perceptual super-resolution, can be converted to corresponding LFSR models. Our experimental results show that by using a recent state-of-the-art SISR model, we can outperform recently reported LFSR-specific models for bicubic degradation by a considerable margin in both the standard test dataset and the recent NTIRE 2023 LFSR challenge test dataset. In the case of models that handle blur kernel, we observe a significant performance improvement after adaptation. Adversarially trained SISR models also show promising results, with less distortion and better perceptual quality in LF images.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"496-509"},"PeriodicalIF":5.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanan Zhao;Yuelong Li;Haichuan Zhang;Vishal Monga;Yonina C. Eldar
{"title":"Deep, Convergent, Unrolled Half-Quadratic Splitting for Image Deconvolution","authors":"Yanan Zhao;Yuelong Li;Haichuan Zhang;Vishal Monga;Yonina C. Eldar","doi":"10.1109/TCI.2024.3377132","DOIUrl":"10.1109/TCI.2024.3377132","url":null,"abstract":"In recent years, algorithm unrolling has emerged as a powerful technique for designing interpretable neural networks based on iterative algorithms. Imaging inverse problems have particularly benefited from unrolling-based deep network design since many traditional model-based approaches rely on iterative optimization. Despite exciting progress, typical unrolling approaches heuristically design layer-specific convolution weights to improve performance. Crucially, convergence properties of the underlying iterative algorithm are lost once layer-specific parameters are learned from training data. We propose an unrolling technique that breaks the trade-off between retaining algorithm properties while simultaneously enhancing performance. We focus on image deblurring and unrolling the widely-applied Half-Quadratic Splitting (HQS) algorithm. We develop a new parametrization scheme which enforces layer-specific parameters to asymptotically approach certain fixed points. Through extensive experimental studies, we verify that our approach achieves competitive performance with state-of-the-art unrolled layer-specific learning and significantly improves over the traditional HQS algorithm. We further establish convergence of the proposed unrolled network as the number of layers approaches infinity, and characterize its convergence rate. Our experimental verification involves simulations that validate the analytical results as well as comparison with state-of-the-art non-blind deblurring techniques on benchmark datasets. The merits of the proposed convergent unrolled network are established over competing alternatives, especially in the regime of limited training.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"574-588"},"PeriodicalIF":5.4,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spline Sketches: An Efficient Approach for Photon Counting Lidar","authors":"Michael P. Sheehan;Julián Tachella;Mike E. Davies","doi":"10.1109/TCI.2024.3404652","DOIUrl":"10.1109/TCI.2024.3404652","url":null,"abstract":"Photon counting lidar has become an invaluable tool for 3D depth imaging due to the fine depth precision it can achieve over long ranges, with emerging applications in robotics, autonomous vehicles and remote sensing. However, high frame rate, high resolution lidar devices produce an enormous amount of time-of-flight (ToF) data which can cause a severe data processing bottleneck hindering the deployment of real-time systems. In this paper, we show that this bottleneck can be avoided through the use of a hardware-friendly compressed statistic, or a so-called spline sketch, of the ToF data, massively reducing the data rate without sacrificing the quality of the recovered depth image. Specifically, as with the previously proposed Fourier sketches, piecewise linear or quadratic spline sketches are able to reconstruct real-world depth images with negligible loss of resolution whilst achieving 95% compression compared to the full ToF data, as well as offering multi-peak detection performance. However, unlike Fourier sketches, splines sketches require minimal on-chip arithmetic computation per photon detection. We also show that by building in appropriate range-walk correction, spline sketches can be made robust to photon pile-up effects associated with bright reflectors. We contrast this with previously proposed solutions such as coarse binning histograms that trade depth resolution for data compression, suffer from a highly nonuniform accuracy across depth and can fail catastrophically when imaging bright reflectors. By providing a practical means of overcoming the data processing bottleneck, spline sketches offer a promising route to low cost high rate, high resolution lidar imaging.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"863-875"},"PeriodicalIF":5.4,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}