{"title":"Advancing Generalizable Remote Physiological Measurement Through the Integration of Explicit and Implicit Prior Knowledge","authors":"Yuting Zhang;Hao Lu;Xin Liu;Yingcong Chen;Kaishun Wu","doi":"10.1109/TIP.2025.3576490","DOIUrl":"10.1109/TIP.2025.3576490","url":null,"abstract":"Remote photoplethysmography (rPPG) is a promising technology for capturing physiological signals from facial videos, with potential applications in medical health, affective computing, and biometric recognition. The demand for rPPG tasks has evolved from achieving high performance in intra-dataset testing to excelling in cross-dataset testing (i.e., domain generalization). However, most existing methods have overlooked the incorporation of prior knowledge specific to rPPG, leading to limited generalization capabilities. In this paper, we propose a novel framework that effectively integrates both explicit and implicit prior knowledge into the rPPG task. Specifically, we conduct a systematic analysis of noise sources (e.g., variations in cameras, lighting conditions, skin types, and motion) across different domains and embed this prior knowledge into the network design. Furthermore, we employ a two-branch network to disentangle physiological feature distributions from noise through implicit label correlation. Extensive experiments demonstrate that the proposed method not only surpasses state-of-the-art approaches in RGB cross-dataset evaluation but also exhibits strong generalization from RGB datasets to NIR datasets. The code is publicly available at <uri>https://github.com/keke-nice/Greip</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3764-3778"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GCM-PDA: A Generative Compensation Model for Progressive Difference Attenuation in Spatiotemporal Fusion of Remote Sensing Images","authors":"Kai Ren;Weiwei Sun;Xiangchao Meng;Gang Yang","doi":"10.1109/TIP.2025.3576992","DOIUrl":"10.1109/TIP.2025.3576992","url":null,"abstract":"High-resolution satellite imagery with dense temporal series is crucial for long-term surface change monitoring. Spatiotemporal fusion seeks to reconstruct remote sensing image sequences with both high spatial and temporal resolutions by leveraging prior information from multiple satellite platforms. However, significant radiometric discrepancies and large spatial resolution variations between images acquired from different satellite sensors, coupled with the limited availability of prior data, present major challenges to accurately reconstructing missing data using existing methods. To address these challenges, this paper introduces GCM-PDA, a novel generative compensation model with progressive difference attenuation for spatiotemporal fusion of remote sensing images. The proposed model integrates multi-scale image decomposition within a progressive fusion framework, enabling the efficient extraction and integration of information across scales. Additionally, GCM-PDA employs domain adaptation techniques to mitigate radiometric inconsistencies between heterogeneous images. Notably, this study pioneers the use of style transformation in spatiotemporal fusion to achieve spatial-spectral compensation, effectively overcoming the constraints of limited prior image information. Experimental results demonstrate that GCM-PDA not only achieves competitive fusion performance but also exhibits strong robustness across diverse conditions.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3817-3832"},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Continual Semantic Segmentation via Uncertainty and Class Balance Re-Weighting","authors":"Zichen Liang;Yusong Hu;Fei Yang;Xialei Liu","doi":"10.1109/TIP.2025.3576477","DOIUrl":"10.1109/TIP.2025.3576477","url":null,"abstract":"Continual Semantic Segmentation (CSS) primarily aims to continually learn new semantic segmentation categories while avoiding catastrophic forgetting. In semantic segmentation tasks, images can comprise both familiar old categories and novel unseen categories and they are treated as background in the incremental stage. Therefore, it is necessary to utilize the old model to generate pseudo-labels. However, the quality of these pseudo-labels significantly influences the model’s forgetting of the old categories. Erroneous pseudo-labels can introduce harmful gradients, thus exacerbating model forgetting. In addition, the issue of class imbalance poses a significant challenge within the realm of CSS. Although traditional methods frequently diminish the emphasis placed on new classes to address this imbalance, we discover that the imbalance extends beyond the distinction between old and new classes. In this paper, we specifically address two previously overlooked problems in CSS: the impact of erroneous pseudo-labels on model forgetting and the confusion induced by class imbalance. We propose an Uncertainty and Class Balance Re-weighting approach (UCB) that assigns higher weights to pixels with pseudo-labels exhibiting lower uncertainty and to categories with smaller proportions during the training process. Our proposed approach enhances the impact of essential pixels during the continual learning process, thereby reducing model forgetting and dynamically balancing category weights based on the dataset. Our method is simple yet effective and can be applied to any method that uses pseudo-labels. Extensive experiments on the Pascal-VOC and ADE20K datasets demonstrate the efficacy of our approach in improving model performance across three state-of-the-art methods. The code will be available at <uri>https://github.com/JACK-Chen-2019/UCB</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3689-3702"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anisotropic Spherical Gaussians Lighting Priors for Indoor Environment Map Estimation","authors":"Junhong Zhao;Bing Xue;Mengjie Zhang","doi":"10.1109/TIP.2025.3575902","DOIUrl":"10.1109/TIP.2025.3575902","url":null,"abstract":"High Dynamic Range (HDR) environment lighting is essential for augmented reality and visual editing applications, enabling realistic object relighting and seamless scene composition. However, the acquisition of accurate HDR environment maps remains resource-intensive, often requiring specialized devices such as light probes or 360° capture systems, and necessitating stitching during postprocessing. Existing deep learning-based methods attempt to estimate global illumination from partial-view images but often struggle with complex lighting conditions, particularly in indoor environments with diverse lighting variations. To address this challenge, we propose a novel method for estimating indoor HDR environment maps from single standard images, leveraging Anisotropic Spherical Gaussians (ASG) to model intricate lighting distributions as priors. Unlike traditional Spherical Gaussian (SG) representations, ASG can better capture anisotropic lighting properties, including complex shape, rotation, and spatial extent. Our approach introduces a transformer-based network with a two-stage training scheme to predict ASG parameters effectively. To leverage these predicted lighting priors for environment map generation, we introduce a novel generative projector that synthesizes environment maps with high-frequency textures. To train the generative projector, we propose a parameter-efficient adaptation method that transfers knowledge from SG-based guidance to ASG, enabling the model to preserve the generalizability of SG (e.g., spatial distribution and dominance of light sources) while enhancing its capacity to capture fine-grained anisotropic lighting characteristics. Experimental results demonstrate that our method yields environment maps with more precise lighting conditions and environment textures, facilitating the realistic rendering of lighting effects. The implementation code for ASG extraction can be found at <uri>https://github.com/junhong-jennifer-zhao/ASG-lighting</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3635-3647"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144252165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Light Field From Stereo Images for AR Display With Matched Angular Sampling Structure and Minimal Retinal Error","authors":"Yi-Chou Chen;Homer H. Chen","doi":"10.1109/TIP.2025.3575333","DOIUrl":"10.1109/TIP.2025.3575333","url":null,"abstract":"Near-eye light field displays offer natural 3D visual experiences for AR/VR users by projecting light rays onto retina as if the light rays were emanated from a real object. Such displays normally take four-dimensional light field data as input. Given that sizeable existing 3D contents are in the form of stereo images, we propose a practical approach that generates light field data from such contents at minimal computational cost while maintaining a reasonable image quality. The perceptual quality of light field is ensured by making the baseline of light field subviews consistent with that of the micro-projectors of the light field display and by compensating for the optical artifact of the light field display through digital rectification. The effectiveness and efficiency of the proposed approach is verified through both quantitative and qualitative experiments. The results demonstrate that our light field converter works for real-world light field displays.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3849-3860"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144252164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching","authors":"Yepeng Liu;Zhichao Sun;Baosheng Yu;Yitian Zhao;Bo Du;Yongchao Xu;Jun Cheng","doi":"10.1109/TIP.2025.3574937","DOIUrl":"10.1109/TIP.2025.3574937","url":null,"abstract":"Many keypoint detection and description methods have been proposed for image matching or registration. While these methods demonstrate promising performance for single-modality image matching, they often struggle with multimodal data because the descriptors trained on single-modality data tend to lack robustness against the non-linear variations present in multimodal data. Extending such methods to multimodal image matching often requires well-aligned multimodal data to learn modality-invariant descriptors. However, acquiring such data is often costly and impractical in many real-world scenarios. To address this challenge, we propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching using only single-modality training data. Specifically, we propose a novel latent feature aggregation module and a cumulative hybrid aggregation module to enhance the base keypoint descriptors trained on single-modality data by leveraging pre-trained features from Stable Diffusion models. We validate our method with recent keypoint detection and description methods in three multimodal retinal image datasets (CF-FA, CF-OCT, EMA-OCTA) and two remote sensing datasets (Optical-SAR and Optical-NIR). Extensive experiments demonstrate that the proposed MIFNet is able to learn modality-invariant feature for multimodal image matching without accessing the targeted modality and has good zero-shot generalization ability. The code will be released at <uri>https://github.com/lyp-deeplearning/MIFNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3593-3608"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class Incremental Learning via Contrastive Complementary Augmentation","authors":"Xi Wang;Xu Yang;Kun Wei;Yanan Gu;Cheng Deng","doi":"10.1109/TIP.2025.3574930","DOIUrl":"10.1109/TIP.2025.3574930","url":null,"abstract":"Class incremental learning (CIL) endeavors to acquire new knowledge continuously from an unending data stream while retaining previously acquired knowledge. Since the amount of new data is significantly smaller than that of old data, existing methods struggle to strike a balance between acquiring new knowledge and retaining previously learned knowledge, leading to substantial performance degradation. To tackle such a dilemma, in this paper, we propose the <bold>Co</b>ntrastive <bold>Co</b>mplementary <bold>A</b>ugmentation <bold>L</b>earning (<bold>CoLA</b>) method, which mitigates the aliasing of distributions in incremental tasks. Specifically, we introduce a novel yet effective supervised contrastive learning module with instance- and class-level augmentation during base training. For the instance-level augmentation method, we spatially segment the image at different scales, creating spatial pyramid contrastive pairs to obtain more robust feature representations. Meanwhile, the class-level augmentation method randomly mixes images within the mini-batch, facilitating the learning of compact and more easily adaptable decision boundaries. In this way, we only need to train the classifier to maintain competitive performance during the incremental phases. Furthermore, we also propose CoLA+ to further enhance the proposed method with relaxed limitations on data storage. Extensive experiments demonstrate that our method achieves state-of-the-art performance on different benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3663-3673"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144237132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xian Zhong;Lingyue Qiu;Huilin Zhu;Jingling Yuan;Shengfeng He;Zheng Wang
{"title":"Multi-Granularity Distribution Alignment for Cross-Domain Crowd Counting","authors":"Xian Zhong;Lingyue Qiu;Huilin Zhu;Jingling Yuan;Shengfeng He;Zheng Wang","doi":"10.1109/TIP.2025.3571312","DOIUrl":"10.1109/TIP.2025.3571312","url":null,"abstract":"Unsupervised domain adaptation enables the transfer of knowledge from a labeled source domain to an unlabeled target domain, and its application in crowd counting is gaining momentum. Current methods typically align distributions across domains to address inter-domain disparities at a global level. However, these methods often struggle with significant intra-domain gaps caused by domain-agnostic factors such as density, surveillance angles, and scale, leading to inaccurate alignment and unnecessary computational burdens, especially in large-scale training scenarios. To address these challenges, we propose the Multi-Granularity Optimal Transport (MGOT) distribution alignment framework, which aligns domain-agnostic factors across domains at different granularities. The motivation behind multi-granularity is to capture fine-grained domain-agnostic variations within domains. Our method proceeds in three phases: first, clustering coarse-grained features based on intra-domain similarity; second, aligning the granular clusters using an optimal transport framework and constructing a mapping from cluster centers to finer patch levels between domains; and third, re-weighting the aligned distribution for model refinement in domain adaptation. Extensive experiments across twelve cross-domain benchmarks show that our method outperforms existing state-of-the-art methods in adaptive crowd counting. The code will be available at <uri>https://github.com/HopooLinZ/MGOT</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3648-3662"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Tian;Xiyun Wang;Sihui Zhang;Wanru Xu;Yi Jin;Yaping Huang
{"title":"‘Disengage AND Integrate’: Personalized Causal Network for Gaze Estimation","authors":"Yi Tian;Xiyun Wang;Sihui Zhang;Wanru Xu;Yi Jin;Yaping Huang","doi":"10.1109/TIP.2025.3575238","DOIUrl":"10.1109/TIP.2025.3575238","url":null,"abstract":"Gaze estimation task aims to predict a 3D gaze direction or a 2D gaze point given a face or eye image. To improve generalization of gaze estimation models to unseen new users, existing methods either disentangle personalized information of all subjects from their gaze features, or integrate unrefined personalized information into blended embeddings. Their methodologies are not rigorous whose performance is still unsatisfactory. In this paper, we put forward a comprehensive perspective named ‘Disengage AND Integrate’ to deal with personalized information, which elaborates that for specified users, their irrelevant personalized information should be discarded while relevant one should be considered. Accordingly, a novel Personalized Causal Network (PCNet) for generalizable gaze estimation has been proposed. The PCNet adopts a two-branch framework, which consists of a subject-deconfounded appearance sub-network (SdeANet) and a prototypical personalization sub-network (ProPNet). The SdeANet aims to explore causalities among facial images, gazes, and personalized information and extract a subject-invariant appearance-aware feature of each image by means of causal intervention. The ProPNet aims to characterize customized personalization-aware features of arbitrary users with the help of a prototype-based subject identification task. Furthermore, our whole PCNet is optimized in a hybrid episodic training paradigm, which further improve its adaptability to new users. Experiments on three challenging datasets over within-domain and cross-domain gaze estimation tasks demonstrate the effectiveness of our method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3733-3747"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144237133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LoopSparseGS: Loop-Based Sparse-View Friendly Gaussian Splatting","authors":"Zhenyu Bao;Guibiao Liao;Kaichen Zhou;Kanglin Liu;Qing Li;Guoping Qiu","doi":"10.1109/TIP.2025.3574929","DOIUrl":"10.1109/TIP.2025.3574929","url":null,"abstract":"Despite the photorealistic novel view synthesis (NVS) performance achieved by the original 3D Gaussian splatting (3DGS), its rendering quality significantly degrades with sparse input views. This performance drop is mainly caused by the limited number of initial points generated from the sparse input, lacking reliable geometric supervision during the training process, and inadequate regularization of the oversized Gaussian ellipsoids. To handle these issues, we propose the LoopSparseGS, a loop-based 3DGS framework for the sparse novel view synthesis task. In specific, we propose a loop-based Progressive Gaussian Initialization (PGI) strategy that could iteratively densify the initialized point cloud using the rendered pseudo images during the training process. Then, the sparse and reliable depth from the Structure from Motion, and the window-based dense monocular depth are leveraged to provide precise geometric supervision via the proposed Depth-alignment Regularization (DAR). Additionally, we introduce a novel Sparse-friendly Sampling (SFS) strategy to handle oversized Gaussian ellipsoids leading to large pixel errors. Comprehensive experiments on four datasets demonstrate that LoopSparseGS outperforms existing state-of-the-art methods for sparse-input novel view synthesis, across indoor, outdoor, and object-level scenes with various image resolutions. Code is available at: <uri>https://github.com/pcl3dv/LoopSparseGS</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3889-3902"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}