{"title":"Reference-Based Iterative Interaction With P2-Matching for Stereo Image Super-Resolution","authors":"Runmin Cong;Rongxin Liao;Feng Li;Ronghui Sheng;Huihui Bai;Renjie Wan;Sam Kwong;Wei Zhang","doi":"10.1109/TIP.2025.3577538","DOIUrl":"10.1109/TIP.2025.3577538","url":null,"abstract":"Stereo Image Super-Resolution (SSR) holds great promise in improving the quality of stereo images by exploiting the complementary information between left and right views. Most SSR methods primarily focus on the inter-view correspondences in low-resolution (LR) space. The potential of referencing a high-quality SR image of one view benefits the SR for the other is often overlooked, while those with abundant textures contribute to accurate correspondences. Therefore, we propose Reference-based Iterative Interaction (RIISSR), which utilizes reference-based iterative pixel-wise and patch-wise matching, dubbed <inline-formula> <tex-math>$P^{2}$ </tex-math></inline-formula>-Matching, to establish cross-view and cross-resolution correspondences for SSR. Specifically, we first design the information perception block (IPB) cascaded in parallel to extract hierarchical contextualized features for different views. Pixel-wise matching is embedded between two parallel IPBs to exploit cross-view interaction in LR space. Iterative patch-wise matching is then executed by utilizing the SR stereo pair as another mutual reference, capitalizing on the cross-scale patch recurrence property to learn high-resolution (HR) correspondences for SSR performance. Moreover, we introduce the supervised side-out modulator (SSOM) to re-weight local intra-view features and produce intermediate SR images, which seamlessly bridge two matching mechanisms. Experimental results demonstrate the superiority of RIISSR against existing state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3779-3789"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Li;Qiankun Dong;Xueshuo Xie;Xia Xu;Tao Li;Zhenwei Shi
{"title":"Transformer for Multitemporal Hyperspectral Image Unmixing","authors":"Hang Li;Qiankun Dong;Xueshuo Xie;Xia Xu;Tao Li;Zhenwei Shi","doi":"10.1109/TIP.2025.3577394","DOIUrl":"10.1109/TIP.2025.3577394","url":null,"abstract":"Multitemporal hyperspectral image unmixing (MTHU) holds significant importance in monitoring and analyzing the dynamic changes of surface. However, compared to single-temporal unmixing, the multitemporal approach demands comprehensive consideration of information across different phases, rendering it a greater challenge. To address this challenge, we propose the Multitemporal Hyperspectral Image Unmixing Transformer (MUFormer), an end-to-end unsupervised deep learning model. To effectively perform multitemporal hyperspectral image unmixing, we introduce two key modules: the Global Awareness Module (GAM) and the Change Enhancement Module (CEM). The GAM computes self-attention across all phases, facilitating global weight allocation. On the other hand, the CEM dynamically learns local temporal changes by capturing differences between adjacent feature maps. The integration of these modules enables the effective capture of multitemporal semantic information related to endmember and abundance changes, significantly improving the performance of multitemporal hyperspectral image unmixing. We conducted experiments on one real dataset and two synthetic datasets, demonstrating that our model significantly enhances the effect of multitemporal hyperspectral image unmixing.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3790-3804"},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing Generalizable Remote Physiological Measurement Through the Integration of Explicit and Implicit Prior Knowledge","authors":"Yuting Zhang;Hao Lu;Xin Liu;Yingcong Chen;Kaishun Wu","doi":"10.1109/TIP.2025.3576490","DOIUrl":"10.1109/TIP.2025.3576490","url":null,"abstract":"Remote photoplethysmography (rPPG) is a promising technology for capturing physiological signals from facial videos, with potential applications in medical health, affective computing, and biometric recognition. The demand for rPPG tasks has evolved from achieving high performance in intra-dataset testing to excelling in cross-dataset testing (i.e., domain generalization). However, most existing methods have overlooked the incorporation of prior knowledge specific to rPPG, leading to limited generalization capabilities. In this paper, we propose a novel framework that effectively integrates both explicit and implicit prior knowledge into the rPPG task. Specifically, we conduct a systematic analysis of noise sources (e.g., variations in cameras, lighting conditions, skin types, and motion) across different domains and embed this prior knowledge into the network design. Furthermore, we employ a two-branch network to disentangle physiological feature distributions from noise through implicit label correlation. Extensive experiments demonstrate that the proposed method not only surpasses state-of-the-art approaches in RGB cross-dataset evaluation but also exhibits strong generalization from RGB datasets to NIR datasets. The code is publicly available at <uri>https://github.com/keke-nice/Greip</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3764-3778"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GCM-PDA: A Generative Compensation Model for Progressive Difference Attenuation in Spatiotemporal Fusion of Remote Sensing Images","authors":"Kai Ren;Weiwei Sun;Xiangchao Meng;Gang Yang","doi":"10.1109/TIP.2025.3576992","DOIUrl":"10.1109/TIP.2025.3576992","url":null,"abstract":"High-resolution satellite imagery with dense temporal series is crucial for long-term surface change monitoring. Spatiotemporal fusion seeks to reconstruct remote sensing image sequences with both high spatial and temporal resolutions by leveraging prior information from multiple satellite platforms. However, significant radiometric discrepancies and large spatial resolution variations between images acquired from different satellite sensors, coupled with the limited availability of prior data, present major challenges to accurately reconstructing missing data using existing methods. To address these challenges, this paper introduces GCM-PDA, a novel generative compensation model with progressive difference attenuation for spatiotemporal fusion of remote sensing images. The proposed model integrates multi-scale image decomposition within a progressive fusion framework, enabling the efficient extraction and integration of information across scales. Additionally, GCM-PDA employs domain adaptation techniques to mitigate radiometric inconsistencies between heterogeneous images. Notably, this study pioneers the use of style transformation in spatiotemporal fusion to achieve spatial-spectral compensation, effectively overcoming the constraints of limited prior image information. Experimental results demonstrate that GCM-PDA not only achieves competitive fusion performance but also exhibits strong robustness across diverse conditions.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3817-3832"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Continual Semantic Segmentation via Uncertainty and Class Balance Re-Weighting","authors":"Zichen Liang;Yusong Hu;Fei Yang;Xialei Liu","doi":"10.1109/TIP.2025.3576477","DOIUrl":"10.1109/TIP.2025.3576477","url":null,"abstract":"Continual Semantic Segmentation (CSS) primarily aims to continually learn new semantic segmentation categories while avoiding catastrophic forgetting. In semantic segmentation tasks, images can comprise both familiar old categories and novel unseen categories and they are treated as background in the incremental stage. Therefore, it is necessary to utilize the old model to generate pseudo-labels. However, the quality of these pseudo-labels significantly influences the model’s forgetting of the old categories. Erroneous pseudo-labels can introduce harmful gradients, thus exacerbating model forgetting. In addition, the issue of class imbalance poses a significant challenge within the realm of CSS. Although traditional methods frequently diminish the emphasis placed on new classes to address this imbalance, we discover that the imbalance extends beyond the distinction between old and new classes. In this paper, we specifically address two previously overlooked problems in CSS: the impact of erroneous pseudo-labels on model forgetting and the confusion induced by class imbalance. We propose an Uncertainty and Class Balance Re-weighting approach (UCB) that assigns higher weights to pixels with pseudo-labels exhibiting lower uncertainty and to categories with smaller proportions during the training process. Our proposed approach enhances the impact of essential pixels during the continual learning process, thereby reducing model forgetting and dynamically balancing category weights based on the dataset. Our method is simple yet effective and can be applied to any method that uses pseudo-labels. Extensive experiments on the Pascal-VOC and ADE20K datasets demonstrate the efficacy of our approach in improving model performance across three state-of-the-art methods. The code will be available at <uri>https://github.com/JACK-Chen-2019/UCB</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3689-3702"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anisotropic Spherical Gaussians Lighting Priors for Indoor Environment Map Estimation","authors":"Junhong Zhao;Bing Xue;Mengjie Zhang","doi":"10.1109/TIP.2025.3575902","DOIUrl":"10.1109/TIP.2025.3575902","url":null,"abstract":"High Dynamic Range (HDR) environment lighting is essential for augmented reality and visual editing applications, enabling realistic object relighting and seamless scene composition. However, the acquisition of accurate HDR environment maps remains resource-intensive, often requiring specialized devices such as light probes or 360° capture systems, and necessitating stitching during postprocessing. Existing deep learning-based methods attempt to estimate global illumination from partial-view images but often struggle with complex lighting conditions, particularly in indoor environments with diverse lighting variations. To address this challenge, we propose a novel method for estimating indoor HDR environment maps from single standard images, leveraging Anisotropic Spherical Gaussians (ASG) to model intricate lighting distributions as priors. Unlike traditional Spherical Gaussian (SG) representations, ASG can better capture anisotropic lighting properties, including complex shape, rotation, and spatial extent. Our approach introduces a transformer-based network with a two-stage training scheme to predict ASG parameters effectively. To leverage these predicted lighting priors for environment map generation, we introduce a novel generative projector that synthesizes environment maps with high-frequency textures. To train the generative projector, we propose a parameter-efficient adaptation method that transfers knowledge from SG-based guidance to ASG, enabling the model to preserve the generalizability of SG (e.g., spatial distribution and dominance of light sources) while enhancing its capacity to capture fine-grained anisotropic lighting characteristics. Experimental results demonstrate that our method yields environment maps with more precise lighting conditions and environment textures, facilitating the realistic rendering of lighting effects. The implementation code for ASG extraction can be found at <uri>https://github.com/junhong-jennifer-zhao/ASG-lighting</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3635-3647"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144252165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Light Field From Stereo Images for AR Display With Matched Angular Sampling Structure and Minimal Retinal Error","authors":"Yi-Chou Chen;Homer H. Chen","doi":"10.1109/TIP.2025.3575333","DOIUrl":"10.1109/TIP.2025.3575333","url":null,"abstract":"Near-eye light field displays offer natural 3D visual experiences for AR/VR users by projecting light rays onto retina as if the light rays were emanated from a real object. Such displays normally take four-dimensional light field data as input. Given that sizeable existing 3D contents are in the form of stereo images, we propose a practical approach that generates light field data from such contents at minimal computational cost while maintaining a reasonable image quality. The perceptual quality of light field is ensured by making the baseline of light field subviews consistent with that of the micro-projectors of the light field display and by compensating for the optical artifact of the light field display through digital rectification. The effectiveness and efficiency of the proposed approach is verified through both quantitative and qualitative experiments. The results demonstrate that our light field converter works for real-world light field displays.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3849-3860"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144252164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching","authors":"Yepeng Liu;Zhichao Sun;Baosheng Yu;Yitian Zhao;Bo Du;Yongchao Xu;Jun Cheng","doi":"10.1109/TIP.2025.3574937","DOIUrl":"10.1109/TIP.2025.3574937","url":null,"abstract":"Many keypoint detection and description methods have been proposed for image matching or registration. While these methods demonstrate promising performance for single-modality image matching, they often struggle with multimodal data because the descriptors trained on single-modality data tend to lack robustness against the non-linear variations present in multimodal data. Extending such methods to multimodal image matching often requires well-aligned multimodal data to learn modality-invariant descriptors. However, acquiring such data is often costly and impractical in many real-world scenarios. To address this challenge, we propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching using only single-modality training data. Specifically, we propose a novel latent feature aggregation module and a cumulative hybrid aggregation module to enhance the base keypoint descriptors trained on single-modality data by leveraging pre-trained features from Stable Diffusion models. We validate our method with recent keypoint detection and description methods in three multimodal retinal image datasets (CF-FA, CF-OCT, EMA-OCTA) and two remote sensing datasets (Optical-SAR and Optical-NIR). Extensive experiments demonstrate that the proposed MIFNet is able to learn modality-invariant feature for multimodal image matching without accessing the targeted modality and has good zero-shot generalization ability. The code will be released at <uri>https://github.com/lyp-deeplearning/MIFNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3593-3608"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class Incremental Learning via Contrastive Complementary Augmentation","authors":"Xi Wang;Xu Yang;Kun Wei;Yanan Gu;Cheng Deng","doi":"10.1109/TIP.2025.3574930","DOIUrl":"10.1109/TIP.2025.3574930","url":null,"abstract":"Class incremental learning (CIL) endeavors to acquire new knowledge continuously from an unending data stream while retaining previously acquired knowledge. Since the amount of new data is significantly smaller than that of old data, existing methods struggle to strike a balance between acquiring new knowledge and retaining previously learned knowledge, leading to substantial performance degradation. To tackle such a dilemma, in this paper, we propose the <bold>Co</b>ntrastive <bold>Co</b>mplementary <bold>A</b>ugmentation <bold>L</b>earning (<bold>CoLA</b>) method, which mitigates the aliasing of distributions in incremental tasks. Specifically, we introduce a novel yet effective supervised contrastive learning module with instance- and class-level augmentation during base training. For the instance-level augmentation method, we spatially segment the image at different scales, creating spatial pyramid contrastive pairs to obtain more robust feature representations. Meanwhile, the class-level augmentation method randomly mixes images within the mini-batch, facilitating the learning of compact and more easily adaptable decision boundaries. In this way, we only need to train the classifier to maintain competitive performance during the incremental phases. Furthermore, we also propose CoLA+ to further enhance the proposed method with relaxed limitations on data storage. Extensive experiments demonstrate that our method achieves state-of-the-art performance on different benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3663-3673"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144237132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xian Zhong;Lingyue Qiu;Huilin Zhu;Jingling Yuan;Shengfeng He;Zheng Wang
{"title":"Multi-Granularity Distribution Alignment for Cross-Domain Crowd Counting","authors":"Xian Zhong;Lingyue Qiu;Huilin Zhu;Jingling Yuan;Shengfeng He;Zheng Wang","doi":"10.1109/TIP.2025.3571312","DOIUrl":"10.1109/TIP.2025.3571312","url":null,"abstract":"Unsupervised domain adaptation enables the transfer of knowledge from a labeled source domain to an unlabeled target domain, and its application in crowd counting is gaining momentum. Current methods typically align distributions across domains to address inter-domain disparities at a global level. However, these methods often struggle with significant intra-domain gaps caused by domain-agnostic factors such as density, surveillance angles, and scale, leading to inaccurate alignment and unnecessary computational burdens, especially in large-scale training scenarios. To address these challenges, we propose the Multi-Granularity Optimal Transport (MGOT) distribution alignment framework, which aligns domain-agnostic factors across domains at different granularities. The motivation behind multi-granularity is to capture fine-grained domain-agnostic variations within domains. Our method proceeds in three phases: first, clustering coarse-grained features based on intra-domain similarity; second, aligning the granular clusters using an optimal transport framework and constructing a mapping from cluster centers to finer patch levels between domains; and third, re-weighting the aligned distribution for model refinement in domain adaptation. Extensive experiments across twelve cross-domain benchmarks show that our method outperforms existing state-of-the-art methods in adaptive crowd counting. The code will be available at <uri>https://github.com/HopooLinZ/MGOT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3648-3662"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}