{"title":"Toward Real-World Super Resolution With Adaptive Self-Similarity Mining","authors":"Zejia Fan;Wenhan Yang;Zongming Guo;Jiaying Liu","doi":"10.1109/TIP.2024.3473320","DOIUrl":"10.1109/TIP.2024.3473320","url":null,"abstract":"Despite efforts to construct super-resolution (SR) training datasets with a wide range of degradation scenarios, existing supervised methods based on these datasets still struggle to consistently offer promising results due to the diversity of real-world degradation scenarios and the inherent complexity of model learning. Our work explores a new route: integrating the sample-adaptive property learned through image intrinsic self-similarity and the universal knowledge acquired from large-scale data. We achieve this by uniting internal learning and external learning by an unrolled optimization process. With the merits of both, the tuned fully-supervised SR models can be augmented to broadly handle the real-world degradation in a plug-and-play style. Furthermore, to promote the efficiency of combining internal/external learning, we apply an attention-based weight-updating method to guide the mining of self-similarity, and various data augmentations are adopted while applying the exponential moving average strategy. We conduct extensive experiments on real-world degraded images and our approach outperforms other methods in both qualitative and quantitative comparisons. Our project is available at: \u0000<uri>https://github.com/ZahraFan/AdaSSR/</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5959-5974"},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuchong Chen;Pengcheng Yao;Rui Gao;Wei Zhang;Shaoyan Gai;Jian Yu;Feipeng Da
{"title":"Error Model and Concise Temporal Network for Indirect Illumination in 3D Reconstruction","authors":"Yuchong Chen;Pengcheng Yao;Rui Gao;Wei Zhang;Shaoyan Gai;Jian Yu;Feipeng Da","doi":"10.1109/TIP.2024.3472502","DOIUrl":"10.1109/TIP.2024.3472502","url":null,"abstract":"3D reconstruction is a fundamental task in robotics and AI, providing a prerequisite for many related applications. Fringe projection profilometry is an efficient and non-contact method for generating 3D point clouds out of 2D images. However, during the actual measurement, it is inevitable to experiment with translucent objects, such as skin, marble, and fruit. Indirect illumination from these objects has substantially compromised the precision of 3D reconstruction via the contamination of 2D images. This paper presents a fast and accurate approach to correct for indirect illumination. The essential idea is to design a highly suitable network architecture founded on a precise error model that facilitates accurate error rectification. Initially, our method transforms the error generated by indirect illumination into a sine series. Based on this error model, the multilayer perceptron is more effective in error correction than traditional methods and convolutional neural networks. Our network was trained solely on simulated data but was tested on authentic images. Three sets of experiments, including two sets of comparison experiments, indicate that the designed network can efficiently rectify the error induced by indirect illumination.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5849-5863"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142385548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Sun;Fangfang Wu;Wei Ding;Xin Li;Jie Lin;Weisheng Dong;Guangming Shi
{"title":"Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising","authors":"Lu Sun;Fangfang Wu;Wei Ding;Xin Li;Jie Lin;Weisheng Dong;Guangming Shi","doi":"10.1109/TIP.2024.3444315","DOIUrl":"10.1109/TIP.2024.3444315","url":null,"abstract":"Deep learning-based video denoising methods have achieved great performance improvements in recent years. However, the expensive computational cost arising from sophisticated network design has severely limited their applications in real-world scenarios. To address this practical weakness, we propose a multiscale spatio-temporal memory network for fast video denoising, named MSTMN, aiming at striking an improved trade-off between cost and performance. To develop an efficient and effective algorithm for video denoising, we exploit a multiscale representation based on the Gaussian-Laplacian pyramid decomposition so that the reference frame can be restored in a coarse-to-fine manner. Guided by a model-based optimization approach, we design an effective variance estimation module, an alignment error estimation module and an adaptive fusion module for each scale of the pyramid representation. For the fusion module, we employ a reconstruction recurrence strategy to incorporate local temporal information. Moreover, we propose a memory enhancement module to exploit the global spatio-temporal information. Meanwhile, the similarity computation of the spatio-temporal memory network enables the proposed network to adaptively search the valuable information at the patch level, which avoids computationally expensive motion estimation and compensation operations. Experimental results on real-world raw video datasets have demonstrated that the proposed lightweight network outperforms current state-of-the-art fast video denoising algorithms such as FastDVDnet, EMVD, and ReMoNet with fewer computational costs.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5810-5823"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142385550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Virtual-Sensor Construction Network Based on Physical Imaging for Image Super-Resolution","authors":"Guozhi Tang;Hongwei Ge;Liang Sun;Yaqing Hou;Mingde Zhao","doi":"10.1109/TIP.2024.3472494","DOIUrl":"10.1109/TIP.2024.3472494","url":null,"abstract":"Image imaging in the real world is based on physical imaging mechanisms. Existing super-resolution methods mainly focus on designing complex network structures to extract and fuse image features more effectively, but ignore the guiding role of physical imaging mechanisms for model design, and cannot mine features from a physical perspective. Inspired by the mechanism of physical imaging, we propose a novel network architecture called Virtual-Sensor Construction network (VSCNet) to simulate the sensor array inside the camera. Specifically, VSCNet first generates different splitting directions to distribute photons to construct virtual sensors, and then performs a multi-stage adaptive fine-tuning operation to fine-tune the number of photons on the virtual sensors to increase the photosensitive area and eliminate photon cross-talk, and finally converts the obtained photon distributions into RGB images. These operations can naturally be regarded as the virtual expansion of the camera’s sensor array in the feature space, which makes our VSCNet bridge the physical space and feature space, and uses their complementarity to mine more effective features to improve performance. Extensive experiments on various datasets show that the proposed VSCNet achieves state-of-the-art performance with fewer parameters. Moreover, we perform experiments to validate the connection between the proposed VSCNet and the physical imaging mechanism. The implementation code is available at \u0000<uri>https://github.com/GZ-T/VSCNet</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5864-5877"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142385628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kumie Gedamu;Yanli Ji;Yang Yang;Jie Shao;Heng Tao Shen
{"title":"Self-Supervised Sub-Action Parsing Network for Semi-Supervised Action Quality Assessment","authors":"Kumie Gedamu;Yanli Ji;Yang Yang;Jie Shao;Heng Tao Shen","doi":"10.1109/TIP.2024.3468870","DOIUrl":"10.1109/TIP.2024.3468870","url":null,"abstract":"Semi-supervised Action Quality Assessment (AQA) using limited labeled and massive unlabeled samples to achieve high-quality assessment is an attractive but challenging task. The main challenge relies on how to exploit solid and consistent representations of action sequences for building a bridge between labeled and unlabeled samples in the semi-supervised AQA. To address the issue, we propose a Self-supervised sub-Action Parsing Network (SAP-Net) that employs a teacher-student network structure to learn consistent semantic representations between labeled and unlabeled samples for semi-supervised AQA. We perform actor-centric region detection and generate high-quality pseudo-labels in the teacher branch and assists the student branch in learning discriminative action features. We further design a self-supervised sub-action parsing solution to locate and parse fine-grained sub-action sequences. Then, we present the group contrastive learning with pseudo-labels to capture consistent motion-oriented action features in the two branches. We evaluate our proposed SAP-Net on four public datasets: the MTL-AQA, FineDiving, Rhythmic Gymnastics, and FineFS datasets. The experiment results show that our approach outperforms state-of-the-art semi-supervised methods by a significant margin.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6057-6070"},"PeriodicalIF":0.0,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy","authors":"Yujie Zhang;Qi Yang;Yiling Xu;Shan Liu","doi":"10.1109/TIP.2024.3468893","DOIUrl":"10.1109/TIP.2024.3468893","url":null,"abstract":"Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Most of the existing FR-PCQA metrics ignore the fact that the human visual system (HVS) dynamically tackles visual information according to different distortion levels (i.e., distortion detection for high-quality samples and appearance perception for low-quality samples) and measure point cloud quality using unified features. To bridge the gap, in this paper, we propose a perception-guided hybrid metric (PHM) that adaptively leverages two visual strategies with respect to distortion degree to predict point cloud quality: to measure visible difference in high-quality samples, PHM takes into account the masking effect and employs texture complexity as an effective compensatory factor for absolute difference; on the other hand, PHM leverages spectral graph theory to evaluate appearance degradation in low-quality samples. Variations in geometric signals on graphs and changes in the spectral graph wavelet coefficients are utilized to characterize geometry and texture appearance degradation, respectively. Finally, the results obtained from the two components are combined in a non-linear method to produce an overall quality score of the tested point cloud. The results of the experiment on five independent databases show that PHM achieves state-of-the-art (SOTA) performance and offers significant performance improvement in multiple distortion environments. The code is publicly available at \u0000<uri>https://github.com/zhangyujie-1998/PHM</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5755-5770"},"PeriodicalIF":0.0,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SETA: Semantic-Aware Edge-Guided Token Augmentation for Domain Generalization","authors":"Jintao Guo;Lei Qi;Yinghuan Shi;Yang Gao","doi":"10.1109/TIP.2024.3470517","DOIUrl":"10.1109/TIP.2024.3470517","url":null,"abstract":"Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the Semantic-aware Edge-guided Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art (SOTA) style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at \u0000<uri>https://github.com/lingeringlight/SETA</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5622-5636"},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate and Robust Object Detection via Selective Adversarial Learning With Constraints","authors":"Jianpin Chen;Heng Li;Qi Gao;Junling Liang;Ruipeng Zhang;Liping Yin;Xinyu Chai","doi":"10.1109/TIP.2024.3470529","DOIUrl":"10.1109/TIP.2024.3470529","url":null,"abstract":"ConvNet-based object detection networks have achieved outstanding performance on clean images. However, many works have shown that these detectors perform poorly on corrupted images caused by noises, blurs, poor weather conditions and so on. With the development of security-sensitive applications, the detector’s practicability has raised increasing concerns. Existing approaches improve detector robustness via extra operations (image restoration or training on extra labeled data) or by applying adversarial training at the expense of performance degradation on clean images. In this paper, we present Selective Adversarial Learning with Constraints (SALC) as a universal detector training approach to simultaneously improve the detector’s precision and robustness. We first propose a unified formulation of adversarial samples for multitask adversarial learning, which significantly diversifies the obtained adversarial samples when integrated into the adversarial training of the detector. Next, we examine our findings on model bias against adversarial attacks of different strengths and differences in Batch Normalization (BN) statistics among clean images and different adversarial samples. On this basis, we propose a batch local comparison strategy with two BN branches to balance the detector’s accuracy and robustness. Furthermore, to avoid performance degradation caused by overwhelming subtask losses, we leverage task-aware ratio thresholds to control the influence of learning in each subtask. The proposed approach can be applied to various detectors without any extra labeled data, inference time costs, or model parameters. Extensive experiments show that our SALC achieves state-of-the-art results on both clean benchmarks (Pascal VOC and MS-COCO) and corruption benchmarks (Pascal VOC-C and MS-COCO-C).","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5593-5605"},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation-Free Velocity Field Super-Resolution on 4D Flow MRI","authors":"Sébastien Levilly;Saïd Moussaoui;Jean-Michel Serfaty","doi":"10.1109/TIP.2024.3470553","DOIUrl":"10.1109/TIP.2024.3470553","url":null,"abstract":"Blood flow observation is of high interest in cardiovascular disease diagnosis and assessment. For this purpose, 2D Phase-Contrast MRI is widely used in the clinical routine. 4D flow MRI sequences, which dynamically image the anatomic shape and velocity vectors within a region of interest, are promising but rarely used due to their low resolution and signal-to-noise ratio (SNR). Computational fluid dynamics (CFD) simulation is considered as a reference solution for resolution enhancement. However, its precision relies on image segmentation and a clinical expertise for the definition of the vessel borders. The main contribution of this paper is a Segmentation-Free Super-Resolution (SFSR) algorithm. Based on inverse problem methodology, SFSR relies on minimizing a compound criterion involving: a data fidelity term, a fluid mechanics term, and a spatial velocity smoothing term. The proposed algorithm is evaluated with respect to state-of-the-art solutions, in terms of quantification error and computation time, on a synthetic 3D dataset with several noise levels, resulting in a 59% RMSE improvement and factor 2 super-resolution with a noise standard deviation of 5% of the Venc. Finally, its performance is demonstrated, with a scale factor of 2 and 3, on a pulsed flow phantom dataset with more complex patterns. The application on in-vivo were achievable within the 10 min. computation time.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5637-5649"},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Cascaded and Crosstalk-Free Multi-Image Encryption Based on Optical Scanning Holography Using 2D Orthogonal Compressive Sensing","authors":"Luozhi Zhang;Qionghua Wang;Zhan Yu;Jinxi Li;Xing Bai;Xin Zhou;Yuanyuan Wu","doi":"10.1109/TIP.2024.3468916","DOIUrl":"10.1109/TIP.2024.3468916","url":null,"abstract":"We propose a non-cascaded and crosstalk-free multi-image encryption method based on optical scanning holography and 2D orthogonal compressive sensing. This approach enables the simultaneous recording and encryption of multiple plaintext images without mechanical scanning, while allows for independent retrieval of each image with exceptional quality and no crosstalk. Two features would bring about more substantial security and privacy. The one is that, by employing a sequence of pre-designed structural patterns as encryption keys at the pupil, multiple samplings can be achieved and ultimately the holographic cyphertext can be obtained. These patterns are generated using a measurement matrix processed with the generalized orthogonal one. As a result, one can accomplish the differentiation of images prior to the recording and thus neither need to pretreat the pending images nor to suppress the out-of-focus noise in the decrypted image. The other one is that, the non-cascaded architecture ensures that different plaintexts do not share sub-keys. Meanwhile, compared to 1D orthogonal compressive sensing, the 2D counterpart makes the proposed method to synchronously deal with multiple images of more complexity, while acquire significantly high-quality decrypted images and far greater encryption capacity. Further, the regularities of conversion between 1D and 2D orthogonal compressive sensing are identified, which may be instructive when to manufacture a practical multi-image cryptosystem or a single-pixel imaging equipment. A more general method or concept named synthesis pupil encoding is advanced. It may provide an effective way to combine multiple encryption methods together into a non-cascaded one. Our method possesses nonlinearity and it is also promising in multi-image asymmetric or public key cryptosystem as well as multi-user multiplexing.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5688-5702"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}