{"title":"DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation","authors":"Yifan Wu;Jiawei Du;Ping Liu;Yuewei Lin;Wei Xu;Wenqing Cheng","doi":"10.1109/TIP.2025.3553786","DOIUrl":"10.1109/TIP.2025.3553786","url":null,"abstract":"Dataset distillation techniques have revolutionized the way of utilizing large datasets by compressing them into smaller, yet highly effective subsets that preserve the original datasets’ accuracy. However, while these methods have proven effective in reducing data size and training times, the robustness of these distilled datasets against adversarial attacks remains underexplored. This vulnerability poses significant risks, particularly in security-sensitive applications. To address this critical gap, we introduce DD-RobustBench, a novel and comprehensive benchmark specifically designed to evaluate the adversarial robustness of distilled datasets. Our benchmark is the most extensive of its kind and integrates a variety of dataset distillation techniques, including recent advancements such as TESLA, DREAM, SRe2L, and D4M, which have shown promise in enhancing model performance. DD-RobustBench also rigorously tests these datasets against a diverse array of adversarial attack methods to ensure broad applicability. Our evaluations cover a wide spectrum of datasets, including but not limited to, the widely used ImageNet-1K. This allows us to assess the robustness of distilled datasets in scenarios mirroring real-world applications. Furthermore, our detailed quantitative analysis investigates how different components involved in the distillation process, such as data augmentation, downsampling, and clustering, affect dataset robustness. Our findings provide critical insights into which techniques enhance or weaken the resilience of distilled datasets against adversarial threats, offering valuable guidelines for developing more robust distillation methods in the future. Through DD-RobustBench, we aim not only to benchmark but also to push the boundaries of dataset distillation research by highlighting areas for improvement and suggesting pathways for future innovations in creating datasets that are not only compact and efficient but also secure and resilient to adversarial challenges. The implementation details and essential instructions are available on DD-RobustBench.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2052-2066"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10944256","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143723296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weifeng Liu;Jian Pang;Bingfeng Zhang;Jin Wang;Baodi Liu;Dapeng Tao
{"title":"See Degraded Objects: A Physics-Guided Approach for Object Detection in Adverse Environments","authors":"Weifeng Liu;Jian Pang;Bingfeng Zhang;Jin Wang;Baodi Liu;Dapeng Tao","doi":"10.1109/TIP.2025.3551533","DOIUrl":"10.1109/TIP.2025.3551533","url":null,"abstract":"In adverse environments, the detector often fails to detect degraded objects because they are almost invisible and their features are weakened by the environment. Common approaches involve image enhancement to support detection, but they inevitably introduce human-invisible noise that negatively impacts the detector. In this work, we propose a physics-guided approach for object detection in adverse environments, which gives a straightforward solution that injects the physical priors into the detector, enabling it to detect poorly visible objects. The physical priors, derived from the imaging mechanism and image property, include environment prior and frequency prior. The environment prior is generated from the physical model, e.g., the atmospheric model, which reflects the density of environmental noise. The frequency prior is explored based on an observation that the amplitude spectrum could highlight object regions from the background. The proposed two priors are complementary in principle. Furthermore, we present a physics-guided loss that incorporates a novel weight item, which is estimated by applying the membership function on physical priors and could capture the extent of degradation. By backpropagating the physics-guided loss, physics knowledge is injected into the detector to aid in locating degraded objects. We conduct experiments in synthetic foggy environment, real foggy environment, and real underwater scenario. The results demonstrate that our method is effective and achieves state-of-the-art performance. The code is available at <uri>https://github.com/PangJian123/See-Degraded-Objects</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2198-2212"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143723299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dandan Shan;Zihan Li;Yunxiang Li;Qingde Li;Jie Tian;Qingqi Hong
{"title":"STPNet: Scale-Aware Text Prompt Network for Medical Image Segmentation","authors":"Dandan Shan;Zihan Li;Yunxiang Li;Qingde Li;Jie Tian;Qingqi Hong","doi":"10.1109/TIP.2025.3571672","DOIUrl":"10.1109/TIP.2025.3571672","url":null,"abstract":"Accurate segmentation of lesions plays a critical role in medical image analysis and diagnosis. Traditional segmentation approaches that rely solely on visual features often struggle with the inherent uncertainty in lesion distribution and size. To address these issues, we propose STPNet, a Scale-aware Text Prompt Network that leverages vision-language modeling to enhance medical image segmentation. Our approach utilizes multi-scale textual descriptions to guide lesion localization and employs retrieval-segmentation joint learning to bridge the semantic gap between visual and linguistic modalities. Crucially, STPNet retrieves relevant textual information from a specialized medical text repository during training, eliminating the need for text input during inference while retaining the benefits of cross-modal learning. We evaluate STPNet on three datasets: COVID-Xray, COVID-CT, and Kvasir-SEG. Experimental results show that our vision-language approach outperforms state-of-the-art segmentation methods, demonstrating the effectiveness of incorporating textual semantic knowledge into medical image analysis. The code has been made publicly on <uri>https://github.com/HUANGLIZI/STPNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3169-3180"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144145507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Irregular Tensor Low-Rank Representation for Hyperspectral Image Representation","authors":"Bo Han;Yuheng Jia;Hui Liu;Junhui Hou","doi":"10.1109/TIP.2025.3571669","DOIUrl":"10.1109/TIP.2025.3571669","url":null,"abstract":"Spectral variations pose a common challenge in analyzing hyperspectral images (HSI). To address this, low-rank tensor representation has emerged as a robust strategy, leveraging inherent correlations within HSI data. However, the spatial distribution of ground objects in HSIs is inherently irregular, existing naturally in tensor format, with numerous class-specific regions manifesting as irregular tensors. Current low-rank representation techniques are designed for regular tensor structures and overlook this fundamental irregularity in real-world HSIs, leading to performance limitations. To tackle this issue, we propose a novel model for irregular tensor low-rank representation tailored to efficiently model irregular 3D cubes. By incorporating a non-convex nuclear norm to promote low-rankness and integrating a global negative low-rank term to enhance the discriminative ability, our proposed model is formulated as a constrained optimization problem and solved using an alternating augmented Lagrangian method. Experimental validation conducted on four public datasets demonstrates the superior performance of our method compared to existing state-of-the-art approaches. The code is publicly available at <uri>https://github.com/hb-studying/ITLRR</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3239-3252"},"PeriodicalIF":0.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144145506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Gaussian Model for Learned Image Compression","authors":"Haotian Zhang;Li Li;Dong Liu","doi":"10.1109/TIP.2025.3550013","DOIUrl":"10.1109/TIP.2025.3550013","url":null,"abstract":"In learned image compression, probabilistic models play an essential role in characterizing the distribution of latent variables. The Gaussian model with mean and scale parameters has been widely used for its simplicity and effectiveness. Probabilistic models with more parameters, such as the Gaussian mixture models, can fit the distribution of latent variables more precisely, but the corresponding complexity is higher. To balance the compression performance and complexity, we extend the Gaussian model to the generalized Gaussian family for more flexible latent distribution modeling, introducing only one additional shape parameter <inline-formula> <tex-math>$beta $ </tex-math></inline-formula> than the Gaussian model. To enhance the performance of the generalized Gaussian model by alleviating the train-test mismatch, we propose improved training methods, including <inline-formula> <tex-math>$beta $ </tex-math></inline-formula>-dependent lower bounds for scale parameters and gradient rectification. Our proposed generalized Gaussian model, coupled with the improved training methods, is demonstrated to outperform the Gaussian and Gaussian mixture models on a variety of learned image compression networks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1950-1965"},"PeriodicalIF":0.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter","authors":"Jinglong Wang;Xiawei Li;Jing Zhang;Qingyuan Xu;Qin Zhou;Qian Yu;Lu Sheng;Dong Xu","doi":"10.1109/TIP.2025.3551648","DOIUrl":"10.1109/TIP.2025.3551648","url":null,"abstract":"The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes. Recently, there has been a growing interest in expanding the application of generative models from generation tasks to semantic segmentation. These approaches utilize generative models either for generating annotated data or extracting features to facilitate semantic segmentation. This typically involves generating a considerable amount of synthetic data or requiring additional mask annotations. To this end, we uncover the potential of generative text-to-image diffusion models (e.g., Stable Diffusion) as highly efficient open-vocabulary semantic segmenters, and introduce a novel training-free approach named DiffSegmenter. The insight is that to generate realistic objects that are semantically faithful to the input text, both the complete object shapes and the corresponding semantics are implicitly learned by diffusion models. We discover that the object shapes are characterized by the self-attention maps while the semantics are indicated through the cross-attention maps produced by the denoising U-Net, forming the basis of our segmentation results. Additionally, we carefully design effective textual prompts and a category filtering mechanism to further enhance the segmentation results. Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1895-1907"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Domain Division Multiplexer for General Continual Learning: A Pseudo Causal Intervention Strategy","authors":"Jialu Wu;Shaofan Wang;Yanfeng Sun;Baocai Yin;Qingming Huang","doi":"10.1109/TIP.2025.3551918","DOIUrl":"10.1109/TIP.2025.3551918","url":null,"abstract":"As a continual learning paradigm where non-stationary data arrive in the form of streams and training occurs whenever a small batch of samples is accumulated, general continual learning (GCL) suffers from both inter-task bias and intra-task bias. Existing GCL methods can hardly simultaneously handle two issues since it requires models to avoid from lying into the spurious correlation trap of GCL. From a causal perspective, we formalize a structural causality model of GCL and conclude that spurious correlation exists not only between confounders and input, but also within multiple causal variables. Inspired by frequency transformation techniques which harbor intricate patterns of image comprehension, we propose a plug-and-play module: the Dual-Domain Division Multiplex (D3M) unit, which intervenes confounders and multiple causal factors over frequency and spatial domains with a two-stage pseudo causal intervention strategy. Typically, D3M consists of a frequency division multiplexer (FDM) module and a spatial division multiplexer (SDM) module, each of which prioritizes target-relevant causal features by dividing and multiplexing features over frequency domain and spatial domain, respectively. As a lightweight and model-agonistic unit, D3M can be seamlessly integrated into most current GCL methods. Extensive experiments on four popular datasets demonstrate that D3M significantly enhances accuracy and diminishes catastrophic forgetting compared to current methods. The code is available at <uri>https://github.com/wangsfan/D3M</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1966-1979"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahui Qu;Xiaoyang Wu;Wenqian Dong;Jizhou Cui;Yunsong Li
{"title":"IR&ArF: Toward Deep Interpretable Arbitrary Resolution Fusion of Unregistered Hyperspectral and Multispectral Images","authors":"Jiahui Qu;Xiaoyang Wu;Wenqian Dong;Jizhou Cui;Yunsong Li","doi":"10.1109/TIP.2025.3551531","DOIUrl":"10.1109/TIP.2025.3551531","url":null,"abstract":"The fusion of hyperspectral image (HSI) and multispectral image (MSI) is an effective mean to improve the inherent defect of low spatial resolution of HSI. However, existing fusion methods usually rigidly upgrade the spatial resolution of HSI to that of matching MSI under the ideal assumption that multi-source images are accurately registered. In real scenes where multi-source images are difficult to be perfectly registered and the spatial resolution requirements are dynamically different, these fusion algorithms is difficult to be effectively deployed. To this end, we construct the spatial-spectral consistent arbitrary scale observation model (S2cAsOM) to model the dependence between the unregistered HSI and MSI and the ideal arbitrary resolution HSI. Furthermore, an optimization algorithm is designed to solve S2cAsOM, and a deep interpretable arbitrary resolution fusion network (IR&ArF) is proposed to simulate the optimization process, which achieves the model-data dual-driven arbitrary resolution fusion of unregistered HSI and MSI. IR&ArF breaks the dependence of traditional fusion methods on the accuracy of image registration in a robust way, and can flexibly cope with the dynamic requirements of diverse applications for the spatial resolution of HSI, which improves the application ability of HSI fusion in real scenes. Extensive systematic experiments demonstrate the superiority and generalization of the proposed method. Source code of the proposed method is available on <uri>https://github.com/Jiahuiqu/IR-ArF</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1934-1949"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hyperbolic Insights With Knowledge Distillation for Cross-Domain Few-Shot Learning","authors":"Xi Yang;Dechen Kong;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3551647","DOIUrl":"10.1109/TIP.2025.3551647","url":null,"abstract":"Cross-domain few-shot learning aims to achieve swift generalization between a source domain and a target domain using a limited number of images. Current research predominantly relies on generalized feature embeddings, employing metric classifiers in Euclidean space for classification. However, due to existing disparities among different data domains, attaining generalized features in the embedding becomes challenging. Additionally, the rise in data domains leads to high-dimensional Euclidean spaces. To address the above problems, we introduce a cross-domain few-shot learning method named Hyperbolic Insights with Knowledge Distillation (HIKD). By integrating knowledge distillation, it enhances the model’s generalization performance, thereby significantly improving task performance. Hyperbolic space, in comparison to Euclidean space, offers a larger capacity and supports the learning of hierarchical structures among images, which can aid generalized learning across different data domains. So we map the Euclidean space features to the hyperbolic space via hyperbolic embedding and utilize hyperbolic fitting distillation method in the meta-training phase to obtain multi-domain unified generalization representation. In the meta-testing phase, accounting for biases between the source and target domains, we present a hyperbolic adaptive module to adjust embedded features and eliminate inter-domain gap. Experiments on the Meta-Dataset demonstrate that HIKD outperforms state-of-the-arts methods with the average accuracy of 80.6%.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1921-1933"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy","authors":"Brice Rauby;Paul Xing;Jonathan Porée;Maxime Gasse;Jean Provost","doi":"10.1109/TIP.2025.3552198","DOIUrl":"10.1109/TIP.2025.3552198","url":null,"abstract":"Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and with a resolution on the order of ten microns. ULM is based on the sub-resolution localization of individual microbubbles injected in the bloodstream. Mapping the whole angioarchitecture requires the accumulation of microbubbles trajectories from thousands of frames, typically acquired over a few minutes. ULM acquisition times can be reduced by increasing the microbubble concentration, but requires more advanced algorithms to detect them individually. Several deep learning approaches have been proposed for this task, but they remain limited to 2D imaging, in part due to the associated large memory requirements. Herein, we propose the use of sparse tensor neural networks to enable deep learning-based 3D ULM by improving memory scalability with increased dimensionality. We study several approaches to efficiently convert ultrasound data into a sparse format and study the impact of the associated loss of information. When applied in 2D, the sparse formulation reduces the memory requirements by a factor 2 at the cost of a small reduction of performance when compared against dense networks. In 3D, the proposed approach reduces memory requirements by two order of magnitude while largely outperforming conventional ULM in high concentration settings. We show that Sparse Tensor Neural Networks in 3D ULM allow for the same benefits as dense deep learning based method in 2D ULM i.e. the use of higher concentration in silico and reduced acquisition time.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2367-2378"},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}