{"title":"A Joint Visual Compression and Perception Framework for Neuromorphic Spiking Camera.","authors":"Kexiang Feng, Chuanmin Jia, Siwei Ma, Wen Gao","doi":"10.1109/TIP.2025.3581372","DOIUrl":"https://doi.org/10.1109/TIP.2025.3581372","url":null,"abstract":"<p><p>The advent of neuromorphic spike cameras has garnered significant attention for their ability to capture continuous motion with unparalleled temporal resolution. However, this imaging attribute necessitates considerable resources for binary spike data storage and transmission. In light of compression and spike-driven intelligent applications, we present the notion of Spike Coding for Intelligence (SCI), wherein spike sequences are compressed and optimized for both bit-rate and task performance. Drawing inspiration from the mammalian vision system, we propose a dual-pathway architecture for separate processing of spatial semantics and motion information, which is then merged to produce features for compression. A refinement scheme is also introduced to ensure consistency between decoded features and motion vectors. We further propose a temporal regression approach that integrates various motion dynamics, capitalizing on the advancements in warping and deformation simultaneously. Comprehensive experiments demonstrate our scheme achieves state-of-the-art (SOTA) performance for spike compression and analysis. We achieve an average 17.25% BD-rate reduction compared to SOTA codecs and a 4.3% accuracy improvement over SpiReco for spike-based classification, with 88.26% complexity reduction and 42.41% inference time saving on the encoding side.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuwu Lu;Haoyu Huang;Wai Keung Wong;Xue Hu;Zhihui Lai;Xuelong Li
{"title":"Adaptive Dispersal and Collaborative Clustering for Few-Shot Unsupervised Domain Adaptation","authors":"Yuwu Lu;Haoyu Huang;Wai Keung Wong;Xue Hu;Zhihui Lai;Xuelong Li","doi":"10.1109/TIP.2025.3581007","DOIUrl":"10.1109/TIP.2025.3581007","url":null,"abstract":"Unsupervised domain adaptation is mainly focused on the tasks of transferring knowledge from a fully-labeled source domain to an unlabeled target domain. However, in some scenarios, the labeled data are expensive to collect, which cause an insufficient label issue in the source domain. To tackle this issue, some works have focused on few-shot unsupervised domain adaptation (FUDA), which transfers predictive models to an unlabeled target domain through a source domain that only contains a few labeled samples. Yet the relationship between labeled and unlabeled source domains are not well exploited in generating pseudo-labels. Additionally, the few-shot setting further prevents the transfer tasks as an excessive domain gap is introduced between the source and target domains. To address these issues, we newly proposed an adaptive dispersal and collaborative clustering (ADCC) method for FUDA. Specifically, for the shortage of the labeled source data, a collaborative clustering algorithm is constructed that expands the labeled source data to obtain more distribution information. Furthermore, to alleviate the negative impact of domain-irrelevant information, we construct an adaptive dispersal strategy that introduces an intermediate domain and pushes both the source and target domains to this intermediate domain. Extensive experiments on the Office31, Office-Home, miniDomainNet, and VisDA-2017 datasets showcase the superior performance of ADCC compared to the state-of-the-art FUDA methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4273-4285"},"PeriodicalIF":0.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144578244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bobo Xi;Wenjie Zhang;Jiaojiao Li;Rui Song;Yunsong Li
{"title":"HyperTaFOR: Task-Adaptive Few-Shot Open-Set Recognition With Spatial-Spectral Selective Transformer for Hyperspectral Imagery","authors":"Bobo Xi;Wenjie Zhang;Jiaojiao Li;Rui Song;Yunsong Li","doi":"10.1109/TIP.2025.3555069","DOIUrl":"https://doi.org/10.1109/TIP.2025.3555069","url":null,"abstract":"Open-set recognition (OSR) aims to accurately classify known categories while effectively rejecting unknown negative samples. Existing methods for OSR in hyperspectral images (HSI) can be generally divided into two categories: reconstruction-based and distance-based methods. Reconstruction-based approaches focus on analyzing reconstruction errors during inference, whereas distance-based methods determine the rejection of unknown samples by measuring their distance to each prototype. However, these techniques often require a substantial amount of training data, which can be both time-consuming and expensive to gather, and they require manual threshold setting, which can be difficult for different tasks. Furthermore, effectively utilizing spectral-spatial information in HSI remains a significant challenge, particularly in open-set scenarios. To tackle these challenges, we introduce a few-shot OSR framework for HSI named HyperTaFOR, which incorporates a novel spatial-spectral selective transformer (S3Former). This framework employs a meta-learning strategy to implement a negative prototype generation module (NPGM) that generates task-adaptive rejection scores, allowing flexible categorization of samples into various known classes and anomalies for each task. Additionally, the S3Former is designed to extract spectral-spatial features, optimizing the use of central pixel information while reducing the impact of irrelevant spatial data. Comprehensive experiments conducted on three benchmark hyperspectral datasets show that our proposed method delivers competitive classification and detection performance in open-set environments when compared to state-of-the-art methods. The code is available online at <uri>https://github.com/B-Xi/TIP_2025_HyperTaFOR</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4148-4160"},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LCTC: Lightweight Convolutional Thresholding Sparse Coding Network Prior for Compressive Hyperspectral Imaging","authors":"Yurong Chen;Yaonan Wang;Xiaodong Wang;Xin Yuan;Hui Zhang","doi":"10.1109/TIP.2025.3583951","DOIUrl":"10.1109/TIP.2025.3583951","url":null,"abstract":"Compressive spectral imaging has garnered significant attention for its ability to effectively enhance the captured spatial and spectral information. Predominant methods, based on compressive sensing, typically formulate the imaging task as a constrained optimization problem and rely on hand-crafted priors to model the sparsity of spectral images. However, these approaches often suffer from suboptimal performance due to the inherent difficulty of identifying an appropriate transform space where spectral images exhibit sparsity. To overcome this limitation, we propose a novel convolutional sparse coding-inspired untrained network prior for fast and adaptive identification of the sparse transform domain and compressible signal. Specifically, a Lightweight Convolutional Thresholding sparse Coding (LCTC) network is designed as the sparse transform domain, with its inputs interpreted as sparse coefficients. Crucially, both the transform domain and its coefficients are solved in a self-supervised learning manner. Furthermore, we demonstrate that LCTC prior can be seamlessly incorporated into the iterative optimization algorithm as a Plug-and-Play (PnP) regularization. Both the LCTC and PnP-LCTC exhibit superior performance compared to previous methods. Experiments under various scenarios validate the effectiveness and efficiency of our approach.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4286-4301"},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Duan;Huimin Chen;Runxin Zhang;Rong Wang;Feiping Nie;Xuelong Li
{"title":"Soft Neighbors Supported Contrastive Clustering","authors":"Yu Duan;Huimin Chen;Runxin Zhang;Rong Wang;Feiping Nie;Xuelong Li","doi":"10.1109/TIP.2025.3583194","DOIUrl":"10.1109/TIP.2025.3583194","url":null,"abstract":"Existing deep clustering methods leverage contrastive or non-contrastive learning to facilitate downstream tasks. Most contrastive-based methods typically learn representations by comparing positive pairs (two views of the same sample) against negative pairs (views of different samples). However, we spot that this hard treatment of samples ignores inter-sample relationships, leading to class collisions and degrade clustering performances. In this paper, we propose a soft neighbor supported contrastive clustering method to address this issue. Specifically, we first introduce a concept called perception radius to quantify similarity confidence between a sample and its neighbors. Based on this insight, we design a two-level soft neighbor loss that captures both local and global neighborhood relationships. Additionally, a cluster-level loss enforces compact and well-separated cluster distributions. Finally, we conduct a pseudo-label refinement strategy to mitigate false negative samples. Extensive experiments on benchmark datasets demonstrate the superiority of our method. The code is available at <uri>https://github.com/DuannYu/soft-neighbors-supported-clustering</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4315-4327"},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meiyan Liang;Shupeng Zhang;Xikai Wang;Bo Li;Muhammad Hamza Javed;Xiaojun Jia;Lin Wang
{"title":"NSB-H2GAN: “Negative Sample”-Boosted Hierarchical Heterogeneous Graph Attention Network for Interpretable Classification of Whole-Slide Images","authors":"Meiyan Liang;Shupeng Zhang;Xikai Wang;Bo Li;Muhammad Hamza Javed;Xiaojun Jia;Lin Wang","doi":"10.1109/TIP.2025.3583127","DOIUrl":"10.1109/TIP.2025.3583127","url":null,"abstract":"Gigapixel whole-slide image (WSI) prediction and region-of-interest localization present considerable challenges due to the diverse range of features both across different slides and within individual slides. Most current methods rely on weakly supervised learning using homogeneous graphs to establish context-aware relevance within slides, often neglecting the rich diversity of heterogeneous information inherent in pathology images. Inspired by the negative sampling strategy of the Determinantal Point Process (DPP) and the hierarchical structure of pathology slides, we introduce the Negative Sample Boosted Hierarchical Heterogeneous Graph Attention Network (NSB-H2GAN). This model addresses the over-smoothing issue typically encountered in classical Graph Convolutional Networks (GCNs) when applied to pathology slides. By incorporating “negative samples” at multiple scales and utilizing hierarchical, heterogeneous feature discrimination, NSB-H2GAN more effectively captures the unique features of each patch, leading to an improved representation of gigapixel WSIs. We evaluated the performance of NSB-H2GAN on three publicly available datasets: CAMELYON16, TCGA-NSCLC and TCGA-COAD. The results show that NSB-H2GAN significantly outperforms existing state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, NSB-H2GAN generates more detailed and interpretable heatmaps, allowing for precise localization of tiny lesions as small as <inline-formula> <tex-math>$200mu mtimes 200mu m$ </tex-math></inline-formula> that are often missed by the human eye. The robust performance of NSB-H2GAN offers a new paradigm for computer-aided pathology diagnosis and holds great potential for advancing the clinical applications of computational pathology.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4215-4229"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144547026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curriculum Dataset Distillation","authors":"Zhiheng Ma;Anjia Cao;Funing Yang;Yihong Gong;Xing Wei","doi":"10.1109/TIP.2025.3579228","DOIUrl":"10.1109/TIP.2025.3579228","url":null,"abstract":"Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K. Our distilled datasets and code are available at <uri>https://github.com/MIV-XJTU/CUDD</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4176-4187"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144547049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Zhang;Zhi Wang;Xiuyi Jia;Zechao Li;Chunlin Chen;Huaxiong Li
{"title":"Multi-View Clustering With Incremental Instances and Views","authors":"Chao Zhang;Zhi Wang;Xiuyi Jia;Zechao Li;Chunlin Chen;Huaxiong Li","doi":"10.1109/TIP.2025.3583122","DOIUrl":"10.1109/TIP.2025.3583122","url":null,"abstract":"Multi-view clustering (MVC) has attracted increasing attention with the emergence of various data collected from multiple sources. In real-world dynamic environment, instances are continually gathered, and the number of views expands as new data sources become available. Learning for such simultaneous increment of instances and views, particularly in unsupervised scenarios, is crucial yet underexplored. In this paper, we address this problem by proposing a novel MVC method with Incremental Instances and Views, MVC-IIV for short. MVC-IIV contains two stages, an initial stage and an incremental stage. In the initial stage, a basic latent multi-view subspace clustering model is constructed to handle existing data, which can be viewed as traditional static MVC. In the incremental stage, the previously trained model is reused to guide learning for newly arriving instances with new views, transferring historical knowledge while avoiding redundant computations. In specific, we design and reuse two modules, i.e., multi-view embedding module for low-dimensional representation learning, and consensus centroids module for cluster probability learning. By adding consistency regularization on the two modules, the knowledge acquired from previous data is used, which not only enhances the exploration within current data batch, but also extracts the between-batch data correlations. The proposed model can be efficiently solved with linear space and time complexity. Extensive experiments demonstrate the effectiveness and efficiency of our method compared with the state-of-the-art approaches.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4203-4214"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144547027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S4Fusion: Saliency-Aware Selective State Space Model for Infrared and Visible Image Fusion","authors":"Haolong Ma;Hui Li;Chunyang Cheng;Gaoang Wang;Xiaoning Song;Xiao-Jun Wu","doi":"10.1109/TIP.2025.3583132","DOIUrl":"10.1109/TIP.2025.3583132","url":null,"abstract":"The preservation and the enhancement of complementary features between modalities are crucial for multi-modal image fusion and downstream vision tasks. However, existing methods are limited to local receptive fields (CNNs) or lack comprehensive utilization of spatial information from both modalities during interaction (transformers), which results in the inability to effectively retain useful information from both modalities in a comparative manner. Consequently, the fused images may exhibit a bias towards one modality, failing to adaptively preserve salient targets from all sources. Thus, a novel fusion framework (S4Fusion) based on the Saliency-aware Selective State Space is proposed. S4Fusion introduces the Cross-Modal Spatial Awareness Module (CMSA), which is designed to simultaneously capture global spatial information from all input modalities and promote effective cross-modal interaction. This enables a more comprehensive representation of complementary features. Furthermore, to guide the model in adaptively preserving salient objects, we propose a novel perception-enhanced loss function. This loss aims to enhance the retention of salient features by minimizing ambiguity or uncertainty, as measured at a pre-trained model’s decision layer, within the fused images. The code is available at <uri>https://github.com/zipper112/S4Fusion</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4161-4175"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144547028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WS-SAM: Generalizing SAM to Weakly Supervised Object Detection With Category Label","authors":"Hao Wang;Tong Jia;Qilong Wang;Wangmeng Zuo","doi":"10.1109/TIP.2025.3581729","DOIUrl":"10.1109/TIP.2025.3581729","url":null,"abstract":"Building an effective object detector usually depends on large well-annotated training samples. While annotating such dataset is extremely laborious and costly, where box-level supervision which contains both accurate classification category and localization coordinate is required. Compared to above box-level supervised annotation, those weakly supervised learning manners (e.g,, category, point and scribble) need relatively less laborious annotation cost, and provide a feasible way to mitigate the reliance on the dataset. Because of the lack of sufficient supervised information, current weakly supervised methods cannot achieve satisfactory detection performance. Recently, Segment Anything Model (SAM) has appeared as a task-agnostic foundation model and shown promising performance improvement in many related works due to its powerful generalization and data processing abilities. The properties of the SAM inspire us to adopt such basic benchmark to weakly supervised object detection field to compensate the deficiencies in supervised information. However, directly deploying SAM on weakly supervised object detection task meets with two issues. Firstly, SAM needs meticulously-designed prompts, and such expert-level prompts restrict their applicability and practicality. Besides, SAM is a category unawareness model, and it cannot assign the category labels to the generated predictions. To solve above issues, we propose WS-SAM, which generalizes Segment Anything Model (SAM) to weakly supervised object detection with category label. Specifically, we design an adaptive prompt generator to take full advantages of the spatial and semantic information from the prompt. It employs in a self-prompting manner by taking the output of SAM from the previous iteration as the prompt input to guide the next iteration, where the prompts can be adaptively generated based on the classification activation map. We also develop a segmentation mask refinement module and formulate the label assignment process as a shortest path optimization problem by considering the similarity between each location and prompts. Furthermore, a bidirectional adapter is also implemented to resolve the domain discrepancy by incorporating domain-specific information. We evaluate the effectiveness of our method on several detection datasets (e.g., PASCAL VOC and MS COCO), and the experiment results show that our proposed method can achieve clear improvement over state-of-the-art methods, while performing favorably against state-of-the-arts.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4052-4066"},"PeriodicalIF":0.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144500691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}