{"title":"Robust Palmprint Recognition via Multi-Stage Noisy Label Selection and Correction","authors":"Huikai Shao;Siyu Shi;Xuefeng Du;Dan Zeng;Dexing Zhong","doi":"10.1109/TIP.2025.3588040","DOIUrl":"10.1109/TIP.2025.3588040","url":null,"abstract":"Deep learning-based palmprint recognition methods take performance to the next level. However, most current methods rely on samples with clean labels. Noisy labels are difficult to avoid in practical applications and may affect the reliability of models, which poses a big challenge. In this paper, we propose a novel Multi-stage Noisy Label Selection and Correction (MNLSC) framework to address this issue. Three stages are proposed to improve the robustness of palmprint recognition. Clean simple samples are firstly selected based on self-supervised learning. A Fourier-based module is constructed to select clean hard samples. A pototype-based module is further introduced for selecting noisy labels from the remaining samples and correcting them. Finally, the model is trained by using clean and corrected labels to improve the performance. Experiments are conducted on several constrained and unconstrained palmprint databases. The results demonstrate the superiority of our method over other methods in dealing with different noise rates. Compared with the baseline method, the accuracy can be improved by up to 33.45% when there are 60% noisy labels.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4591-4601"},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144646028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight Class Incremental Semantic Segmentation Without Catastrophic Forgetting","authors":"Wei Cong;Yang Cong;Yu Ren","doi":"10.1109/TIP.2025.3588065","DOIUrl":"10.1109/TIP.2025.3588065","url":null,"abstract":"Class incremental semantic segmentation (CISS) aims to progressively segment newly introduced classes while preserving the memory of previously learned ones. Traditional CISS methods directly employ advanced semantic segmentation models (e.g., Deeplab-v3) as continual learners. However, these methods require substantial computational and memory resources, limiting their deployment on edge devices. In this paper, we propose a Lightweight Class Incremental Semantic Segmentation (LISS) model tailored for resource-constrained scenarios. Specifically, we design an automatic knowledge-preservation pruning strategy based on the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which automatically compresses the CISS model by searching for global penalty coefficients. Nonetheless, reducing model parameters exacerbates catastrophic forgetting during incremental learning. To mitigate this challenge, we develop a clustering-based pseudo labels generator to obtain high-quality pseudo labels by considering the feature space structure of old classes. It adjusts predicted probabilities from the old model according to the feature proximity to nearest sub-cluster centers for each class. Additionally, we introduce a customized soft labels module that distills the semantic relationships between classes separately. It decomposes soft labels into target probabilities, background probabilities, and other probabilities, thereby maintaining knowledge of previously learned classes in a fine-grained manner. Extensive experiments on two benchmark datasets demonstrate that our LISS model outperforms state-of-the-art approaches in both effectiveness and efficiency.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4566-4579"},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144646026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Liu;Puhong Duan;Zhuojun Xie;Xudong Kang;Shutao Li
{"title":"Learning From Vision Foundation Models for Cross-Domain Remote Sensing Image Segmentation","authors":"Wang Liu;Puhong Duan;Zhuojun Xie;Xudong Kang;Shutao Li","doi":"10.1109/TIP.2025.3588041","DOIUrl":"10.1109/TIP.2025.3588041","url":null,"abstract":"Cross-domain image segmentation plays a crucial role in the field of remote sensing. Current approaches often rely on a mean-teacher model that is integrated from student models to guide the training of the student model itself. However, the feature space of the mean-teacher model exhibits significant domain discrepancy and considerable class overlap, which results in suboptimal performance. Motivated by the idea of learning from stronger teachers, we introduce a robust domain adaptation method called LFMDA. This novel approach is the first to explicitly enhance cross-domain semantic segmentation performance by leveraging vision foundation models (VFMs) within remote sensing applications. Specifically, we propose a prototypical contrastive knowledge distillation loss (PCD) that enables the student model to produce domain-invariant yet category-discriminative features by distilling knowledge from a domain-generalized VFM teacher. Additionally, we introduce a local region homogenization strategy (LRH) to generate high-quality and high-quantity pseudo-labels by incorporating a Segment Anything Model (SAM). Extensive empirical evaluations demonstrate that our method outperforms existing approaches, setting a new state-of-the-art (SOTA) method in domain-adaptive remote sensing image segmentation. The code is available at <uri>https://github.com/StuLiu/LFMDA</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4553-4565"},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144646027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Jin;Zhengyun Lu;Zechao Li;Yonghua Pan;Longquan Dai;Jinhui Tang;Ramesh Jain
{"title":"Causal Inference Hashing for Long-Tailed Image Retrieval","authors":"Lu Jin;Zhengyun Lu;Zechao Li;Yonghua Pan;Longquan Dai;Jinhui Tang;Ramesh Jain","doi":"10.1109/TIP.2025.3588054","DOIUrl":"10.1109/TIP.2025.3588054","url":null,"abstract":"In hashing-based long-tailed image retrieval, the dominance of data-rich head classes often hinders the learning of effective hash codes for data-poor tail classes due to inherent long-tailed bias. Interestingly, this bias also contains valuable prior knowledge by revealing inter-class dependencies, which can be beneficial for hash learning. However, previous methods have not thoroughly analyzed this tangled negative and positive effects of long-tailed bias from a causal inference perspective. In this paper, we propose a novel hash framework that employs causal inference to disentangle detrimental bias effects from beneficial ones. To capture good bias in long-tailed datasets, we construct hash mediators that conserve valuable prior knowledge from class centers. Furthermore, we propose a de-biased hash loss To enhance the beneficial bias effects while mitigating adverse ones, leading to more discriminative hash codes. Specifically, this loss function leverages the beneficial bias captured by hash mediators to support accurate class label prediction, while mitigating harmful bias by blocking its causal path to the hash codes and refining predictions through backdoor adjustment. Extensive experimental results on four widely used datasets demonstrate that the proposed method improves retrieval performance against the state-of-the-art methods by large margins. The source code is available at <uri>https://github.com/IMAG-LuJin/CIH</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5099-5114"},"PeriodicalIF":13.7,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144645999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangjie Sui;Hanwei Zhu;Xuelin Liu;Yuming Fang;Shiqi Wang;Zhou Wang
{"title":"Perceptual Quality Assessment of 360° Images Based on Generative Scanpath Representation","authors":"Xiangjie Sui;Hanwei Zhu;Xuelin Liu;Yuming Fang;Shiqi Wang;Zhou Wang","doi":"10.1109/TIP.2025.3583181","DOIUrl":"10.1109/TIP.2025.3583181","url":null,"abstract":"Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360°) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360° images. Two critical aspects underline this oversight: the neglect of viewing conditions that significantly sway user gaze patterns and the overreliance on a single viewport sequence from the 360° image for quality inference. To address these issues, we introduce a unique generative scanpath representation (GSR) for effective quality inference of 360° images, which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition. More specifically, given a viewing condition characterized by the starting point of viewing and exploration time, a set of scanpaths consisting of dynamic visual fixations can be produced using an apt scanpath generator. Following this vein, we use the scanpaths to convert the 360° image into the unique GSR, which provides a global overview of gazed-focused contents derived from scanpaths. As such, the quality inference of the 360° image is swiftly transformed to that of GSR. We then propose an efficient OIQA computational framework by learning the quality maps of GSR. Comprehensive experimental results validate that the predictions of the proposed framework are highly consistent with human perception in the spatiotemporal domain, especially in the challenging context of locally distorted 360° images under varied viewing conditions. The code will be released at <uri>https://github.com/xiangjieSui/GSR</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4485-4499"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Qin;Lifu Huang;Dezhong Peng;Bohan Jiang;Joey Tianyi Zhou;Xi Peng;Peng Hu
{"title":"Trustworthy Visual-Textual Retrieval","authors":"Yang Qin;Lifu Huang;Dezhong Peng;Bohan Jiang;Joey Tianyi Zhou;Xi Peng;Peng Hu","doi":"10.1109/TIP.2025.3587575","DOIUrl":"10.1109/TIP.2025.3587575","url":null,"abstract":"Visual-textual retrieval, as a link between computer vision and natural language processing, aims at jointly learning visual-semantic relevance to bridge the heterogeneity gap across visual and textual spaces. Existing methods conduct retrieval only relying on the ranking of pairwise similarities, but they cannot self-evaluate the uncertainty of retrieved results, resulting in unreliable retrieval and hindering interpretability. To address this problem, we propose a novel Trust-Consistent Learning framework (TCL) to endow visual-textual retrieval with uncertainty evaluation for trustworthy retrieval. More specifically, TCL first models the matching evidence according to cross-modal similarity to estimate the uncertainty for cross-modal uncertainty-aware learning. Second, a simple yet effective consistency module is presented to enforce the subjective opinions of bidirectional learning to be consistent for high reliability and accuracy. Finally, extensive experiments are conducted to demonstrate the superiority and generalizability of TCL on six widely-used benchmark datasets, i.e., Flickr30K, MS-COCO, MSVD, MSR-VTT, ActivityNet, and DiDeMo. Furthermore, some qualitative experiments are carried out to provide comprehensive and insightful analyses for trustworthy visual-textual retrieval, verifying the reliability and interoperability of TCL. The code is available in <uri>https://github.com/QinYang79/TCL</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4515-4526"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UniEmoX: Cross-Modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception","authors":"Chuang Chen;Xiao Sun;Zhi Liu","doi":"10.1109/TIP.2025.3587577","DOIUrl":"10.1109/TIP.2025.3587577","url":null,"abstract":"Visual emotion analysis holds significant research value in both computer vision and psychology. However, existing methods for visual emotion analysis suffer from limited generalizability due to the ambiguity of emotion perception and the diversity of data scenarios. To tackle this issue, we introduce UniEmoX, a cross-modal semantic-guided large-scale pretraining framework. Inspired by psychological research emphasizing the inseparability of the emotional exploration process from the interaction between individuals and their environment, UniEmoX integrates scene-centric and person-centric low-level image spatial structural information, aiming to derive more nuanced and discriminative emotional representations. By exploiting the similarity between paired and unpaired image-text samples, UniEmoX distills rich semantic knowledge from the CLIP model to enhance emotional embedding representations more effectively. To the best of our knowledge, this is the first large-scale pretraining framework that integrates psychological theories with contemporary contrastive learning and masked image modeling techniques for emotion analysis across diverse scenarios. Additionally, we develop a visual emotional dataset titled Emo8. Emo8 samples cover a range of domains, including cartoon, natural, realistic, science fiction and advertising cover styles, covering nearly all common emotional scenes. Comprehensive experiments conducted on seven benchmark datasets across two downstream tasks validate the effectiveness of UniEmoX. The source code is available at <uri>https://github.com/chincharles/u-emo</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4691-4705"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andong Lu;Wanyu Wang;Chenglong Li;Jin Tang;Bin Luo
{"title":"AFTER: Attention-Based Fusion Router for RGBT Tracking","authors":"Andong Lu;Wanyu Wang;Chenglong Li;Jin Tang;Bin Luo","doi":"10.1109/TIP.2025.3586467","DOIUrl":"10.1109/TIP.2025.3586467","url":null,"abstract":"Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel Attention-based Fusion router called AFTER, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFTER against state-of-the-art RGBT trackers. We release the code in <uri>https://github.com/Alexadlu/AFter</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4386-4401"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuelong Li;Hongjun An;Haofei Zhao;Guangying Li;Bo Liu;Xing Wang;Guanghua Cheng;Guojun Wu;Zhe Sun
{"title":"StreakNet-Arch: An Anti-Scattering Network-Based Architecture for Underwater Carrier LiDAR-Radar Imaging","authors":"Xuelong Li;Hongjun An;Haofei Zhao;Guangying Li;Bo Liu;Xing Wang;Guanghua Cheng;Guojun Wu;Zhe Sun","doi":"10.1109/TIP.2025.3586431","DOIUrl":"10.1109/TIP.2025.3586431","url":null,"abstract":"In this paper, we introduce StreakNet-Arch, a real-time, end-to-end binary-classification framework based on our self-developed Underwater Carrier LiDAR-Radar (UCLR) that embeds Self-Attention and our novel Double Branch Cross Attention (DBC-Attention) to enhance scatter suppression. Under controlled water tank validation conditions, StreakNet-Arch with Self-Attention or DBC-Attention outperforms traditional bandpass filtering and achieves higher <inline-formula> <tex-math>$F_{1}$ </tex-math></inline-formula> scores than learning-based MP networks and CNNs at comparable model size and complexity. Real-time benchmarks on an NVIDIA RTX 3060 show a constant Average Imaging Time (54 to 84 ms) regardless of frame count, versus a linear increase (58 to 1,257 ms) for conventional methods. To facilitate further research, we contribute a publicly available streak-tube camera image dataset contains 2,695,168 real-world underwater 3D point cloud data. More importantly, we validate our UCLR system in a South China Sea trial, reaching an error of 46mm for 3D target at 1,000 m depth and 20 m range. Source code and data are available at <uri>https://github.com/BestAnHongjun/StreakNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4357-4370"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SRENet: Saliency-Based Lighting Enhancement Network","authors":"Yuming Fang;Chen Peng;Chenlei Lv;Weisi Lin","doi":"10.1109/TIP.2025.3587588","DOIUrl":"10.1109/TIP.2025.3587588","url":null,"abstract":"Lighting enhancement is a classical topic in low-level image processing. Existing studies mainly focus on global illumination optimization while overlooking local semantic objects, and this limits the performance of exposure compensation. In this paper, we introduce SRENet, a novel lighting enhancement network guided by saliency information. It adopts a two-step strategy of foreground-background separation optimization to achieve a balance between global and local illumination. In the first step, we extract salient regions and implement the local illumination enhancement that ensures the exposure quality of salient objects. Next, we utilize a fusion module to process global lighting optimization based on local enhanced results. With the two-step strategy, the proposed SRENet yield better lighting enhancement for local illumination while preserving the globally optimal results. Experimental results demonstrate that our method obtains more effective enhancement results for various tasks of exposure correction and lighting quality improvement. The source code and pre-trained models are available at <uri>https://github.com/PlanktonQAQ/SRENet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4541-4552"},"PeriodicalIF":0.0,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144639872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}