{"title":"Deep Face Leakage: Inverting High-Quality Faces From Gradients Using Residual Optimization","authors":"Xu Zhang;Tao Xiang;Shangwei Guo;Fei Yang;Tianwei Zhang","doi":"10.1109/TIP.2025.3533210","DOIUrl":"10.1109/TIP.2025.3533210","url":null,"abstract":"Collaborative learning has gained significant traction for training deep learning models without sharing the original data of participants, particularly when dealing with sensitive data such as facial images. However, current gradient inversion attacks are employed to progressively reconstruct private data from gradients, and they have shown successful in extracting private training data. Nonetheless, our observations reveal that these methods exhibit suboptimal performance in face reconstruction and result in the loss of numerous facial details. In this paper, we propose DFLeak, an effective approach to boost face leakage from gradients using residual optimization and thwart the privacy of facial applications in collaborative learning. In particular, we first introduce a superior initialization method to stabilize the inversion process. Second, we propose to integrate prior-free face restoration (PFFR) results into the gradient inversion optimization process in a residual manner, which enriches facial details. We further design a pixel update schedule to mitigate the adverse effects of image regularization terms and preserve fine facial details. Comprehensive experimentation demonstrates the effectiveness of our approach in achieving more realistic and higher-quality facial image reconstructions, surpassing the performance of state-of-the-art gradient inversion attacks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1560-1572"},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grid-Guided Sparse Laplacian Consensus for Robust Feature Matching","authors":"Yifan Xia;Jiayi Ma","doi":"10.1109/TIP.2025.3539469","DOIUrl":"10.1109/TIP.2025.3539469","url":null,"abstract":"Feature matching is a fundamental concern widely employed in computer vision applications. This paper introduces a novel and efficacious method named Grid-guided Sparse Laplacian Consensus, rooted in the concept of smooth constraints. To address challenging scenes such as severe deformation and independent motions, we devise grid-based adaptive matching guidance to construct multiple transformations based on motion coherence. Specifically, we obtain a set of precise yet sparse seed correspondences through motion statistics, facilitating the generation of an adaptive number of candidate correspondence sets. In addition, we propose an innovative formulation grounded in graph Laplacian for correspondence pruning, wherein mapping function estimation is formulated as a Bayesian model. We solve this utilizing EM algorithm with seed correspondences as initialization for optimal convergence. Sparse approximation is leveraged to reduce the time-space burden. A comprehensive set of experiments are conducted to demonstrate the superiority of our method over other state-of-the-art methods in both robustness to serious deformations and generalizability for various descriptors, as well as generalizability to multi motions. Additionally, experiments in geometric estimation, image registration, loop closure detection, and visual localization highlight the significance of our method across diverse scenes for high-level tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1367-1381"},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yutao Hu;Yulong Wang;Libin Wang;Han Li;Hong Chen;Yuan Yan Tang
{"title":"Tensor Nuclear Norm-Based Multi-Channel Atomic Representation for Robust Face Recognition","authors":"Yutao Hu;Yulong Wang;Libin Wang;Han Li;Hong Chen;Yuan Yan Tang","doi":"10.1109/TIP.2025.3539472","DOIUrl":"10.1109/TIP.2025.3539472","url":null,"abstract":"Numerous representation-based classification (RC) methods have been developed for face recognition due to their decent model interpretability and robustness against noise. Most existing RC methods primarily characterize the gray-scale reconstruction error image (single-channel data) in two ways: the one-dimensional (1D) pixel-based error model and the two-dimensional (2D) gray-scale image-matrix-based error model. The former measures the reconstruction error pixel by pixel, while the latter leverages 2D structural information of the gray-scale error image, such as the low-rank property. However, when applying these methods to different color channels of a test color face image (multi-channel data) separately and independently, they neglect the three-dimensional (3D) structural correlations among distinct color channels. In real-world scenarios, face images are often contaminated with complex noise, including contiguous occlusion and random pixel corruption, which pose significant challenges to these approaches and can lead to a decline in performance. In this paper, we propose a Tensor Nuclear Norm based Robust Multi-channel Atomic Representation (TNN-RMAR) framework with application to color face recognition. The proposed method has the following three critical ingredients: 1) We propose a 3D color image-tensor-based error model, which can take full advantage of the 3D structural information of the color error image. 2) To leverage the 3D structural information of the color error image, we model it as a 3-order tensor <inline-formula> <tex-math>${mathcal {E}}$ </tex-math></inline-formula> and exploit its low-rank property with the tensor nuclear norm. Given that multiple color channels in a color image are generally corrupted at the same positions, we design a tube-wise tailored loss function to further leverage its tube-wise structure. 3) We devise the multi-channel atomic norm (MAN) regularization for the representation coefficient matrix, which allows us to jointly harness the correlation information of coefficients in different color channels. In addition, we also devise an efficient algorithm to solve the TNN-RMAR framework based on the alternating direction method of multipliers (ADMM) framework. By leveraging TNN-RMAR as a general platform, we also develop several novel robust multi-channel RC methods. Experimental results on benchmark real-world databases validate the effectiveness and robustness of the proposed framework for robust color face recognition.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1311-1325"},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143443344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Space Learning-Based Ensemble Clustering","authors":"Yalan Qin;Nan Pu;Nicu Sebe;Guorui Feng","doi":"10.1109/TIP.2025.3540297","DOIUrl":"10.1109/TIP.2025.3540297","url":null,"abstract":"Ensemble clustering fuses a set of base clusterings and shows promising capability in achieving more robust and better clustering results. The existing methods usually realize ensemble clustering by adopting a co-association matrix to measure how many times two data points are categorized into the same cluster based on the base clusterings. Though great progress has been achieved, the obtained co-association matrix is constructed based on the combination of different connective matrices or its variants. These methods ignore exploring the inherent latent space shared by multiple connective matrices and learning the corresponding co-association matrices according to this latent space. Moreover, these methods neglect to learn discriminative connective matrices, explore the high-order relation among these connective matrices and consider the latent space in a unified framework. In this paper, we propose a Latent spacE leArning baseD Ensemble Clustering (LEADEC), which introduces the latent space shared by different connective matrices and learns the corresponding connective matrices according to this latent space. Specifically, we factorize the original multiple connective matrices into a consensus latent space representation and the specific connective matrices. Meanwhile, the orthogonal constraint is imposed to make the latent space representation more discriminative. In addition, we collect the obtained connective matrices based on the latent space into a tensor with three orders to investigate the high-order relations among these connective matrices. The connective matrices learning, the high-order relation investigation among connective matrices and the latent space representation learning are integrated into a unified framework. Experiments on seven benchmark datasets confirm the superiority of LEADEC compared with the existing representive methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1259-1270"},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Line-of-Sight Depth Attention for Panoptic Parsing of Distant Small-Faint Instances","authors":"Zhongqi Lin;Xudong Jiang;Zengwei Zheng","doi":"10.1109/TIP.2025.3540265","DOIUrl":"10.1109/TIP.2025.3540265","url":null,"abstract":"Current scene parsers have effectively distilled abstract relationships among refined instances, while overlooking the discrepancies arising from variations in scene depth. Hence, their potential to imitate the intrinsic 3D perception ability of humans is constrained. In accordance with the principle of perspective, we advocate first grading the depth of the scenes into several slices, and then digging semantic correlations within a slice or between multiple slices. Two attention-based components, namely the Scene Depth Grading Module (SDGM) and the Edge-oriented Correlation Refining Module (EoCRM), comprise our framework, the Line-of-Sight Depth Network (LoSDN). SDGM grades scene into several slices by calculating depth attention tendencies based on parameters with explicit physical meanings, e.g., albedo, occlusion, specular embeddings. This process allocates numerous multi-scale instances to each scene slice based on their line-of-sight extension distance, establishing a solid groundwork for ordered association mining in EoCRM. Since the primary step in distinguishing distant faint targets is boundary delineation, EoCRM implements edge-wise saliency quantification and association digging. Quantitative and diagnostic experiments on Cityscapes, ADE20K, and PASCAL Context datasets reveal the competitiveness of LoSDN and the individual contribution of each highlight. Visualizations display that our strategy offers clear benefits in detecting distant, faint targets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1354-1366"},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun Wang;Kunhong Li;Longguang Wang;Junjie Hu;Dapeng Oliver Wu;Yulan Guo
{"title":"ADStereo: Efficient Stereo Matching With Adaptive Downsampling and Disparity Alignment","authors":"Yun Wang;Kunhong Li;Longguang Wang;Junjie Hu;Dapeng Oliver Wu;Yulan Guo","doi":"10.1109/TIP.2025.3540282","DOIUrl":"10.1109/TIP.2025.3540282","url":null,"abstract":"The balance between accuracy and computational efficiency is crucial for the applications of deep learning-based stereo matching algorithms in real-world scenarios. Since matching cost aggregation is usually the most computationally expensive component, a common practice is to construct cost volumes at a low resolution for aggregation and then directly regress a high-resolution disparity map. However, current solutions often suffer from limitations such as the loss of discriminative features caused by downsampling operations that treat all pixels equally, and spatial misalignment resulting from repeated downsampling and upsampling. To overcome these challenges, this paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue. Building upon these modules, we introduce ADStereo, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks. Specifically, our ADStereo runs over <inline-formula> <tex-math>$5times $ </tex-math></inline-formula> faster than the current state-of-the-art CREStereo (0.054s vs. <inline-formula> <tex-math>$0.29{s}$ </tex-math></inline-formula>) under the same hardware while achieving comparable accuracy (1.82% vs. 1.69%) on the KITTI stereo 2015 benchmark. The codes are available at: <uri>https://github.com/cocowy1/ADStereo</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1204-1218"},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive Neuron Pruning for Backdoor Defense","authors":"Yu Feng;Benteng Ma;Dongnan Liu;Yanning Zhang;Weidong Cai;Yong Xia","doi":"10.1109/TIP.2025.3539466","DOIUrl":"10.1109/TIP.2025.3539466","url":null,"abstract":"Recent studies have revealed that deep neural networks (DNNs) are susceptible to backdoor attacks, in which attackers insert a pre-defined backdoor into a DNN model by poisoning a few training samples. A small subset of neurons in DNN is responsible for activating this backdoor and pruning these backdoor-associated neurons has been shown to mitigate the impact of such attacks. Current neuron pruning techniques often face challenges in accurately identifying these critical neurons, and they typically depend on the availability of labeled clean data, which is not always feasible. To address these challenges, we propose a novel defense strategy called Contrastive Neuron Pruning (CNP). This approach is based on the observation that poisoned samples tend to cluster together and are distinguishable from benign samples in the feature space of a backdoored model. Given a backdoored model, we initially apply a reversed trigger to benign samples, generating multiple positive (benign-benign) and negative (benign-poisoned) feature pairs from the backdoored model. We then employ contrastive learning on these pairs to improve the separation between benign and poisoned features. Subsequently, we identify and prune neurons in the Batch Normalization layers that show significant response differences to the generated pairs. By removing these backdoor-associated neurons, CNP effectively defends against backdoor attacks while requiring the pruning of only about 1% of the total neurons. Comprehensive experiments conducted on various benchmarks validate the efficacy of CNP, demonstrating its robustness and effectiveness in mitigating backdoor attacks compared to existing methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1234-1245"},"PeriodicalIF":0.0,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shell-Guided Compression of Voxel Radiance Fields","authors":"Peiqi Yang;Zhangkai Ni;Hanli Wang;Wenhan Yang;Shiqi Wang;Sam Kwong","doi":"10.1109/TIP.2025.3538163","DOIUrl":"10.1109/TIP.2025.3538163","url":null,"abstract":"In this paper, we address the challenge of significant memory consumption and redundant components in large-scale voxel-based model, which are commonly encountered in real-world 3D reconstruction scenarios. We propose a novel method called Shell-guided compression of Voxel Radiance Fields (SVRF), aimed at optimizing voxel-based model into a shell-like structure to reduce storage costs while maintaining rendering accuracy. Specifically, we first introduce a Shell-like Constraint, operating in two main aspects: 1) enhancing the influence of voxels neighboring the surface in determining the rendering outcomes, and 2) expediting the elimination of redundant voxels both inside and outside the surface. Additionally, we introduce an Adaptive Thresholds to ensure appropriate pruning criteria for different scenes. To prevent the erroneous removal of essential object parts, we further employ a Dynamic Pruning Strategy to conduct smooth and precise model pruning during training. The compression method we propose does not necessitate the use of additional labels. It merely requires the guidance of self-supervised learning based on predicted depth. Furthermore, it can be seamlessly integrated into any voxel-grid-based method. Extensive experimental results demonstrate that our method achieves comparable rendering quality while compressing the original number of voxel grids by more than 70%. Our code will be available at: <uri>https://github.com/eezkni/SVRF</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1179-1191"},"PeriodicalIF":0.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143385356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sisi Wang;Feiping Nie;Zheng Wang;Rong Wang;Xuelong Li
{"title":"Data Subdivision Based Dual-Weighted Robust Principal Component Analysis","authors":"Sisi Wang;Feiping Nie;Zheng Wang;Rong Wang;Xuelong Li","doi":"10.1109/TIP.2025.3536197","DOIUrl":"10.1109/TIP.2025.3536197","url":null,"abstract":"Principal Component Analysis (PCA) is one of the most important unsupervised dimensionality reduction algorithms, which uses squared <inline-formula> <tex-math>$ell _{2}$ </tex-math></inline-formula>-norm to make it very sensitive to outliers. Those improved versions based on <inline-formula> <tex-math>$ell _{1}$ </tex-math></inline-formula>-norm alleviate this problem, but they have other shortcomings, such as optimization difficulties or lack of rotational invariance, etc. Besides, existing methods only vaguely divide normal samples and outliers to improve robustness, but they ignore the fact that normal samples can be more specifically divided into positive samples and hard samples, which should have different contributions to the model because positive samples are more conducive to learning the projection matrix. In this paper, we propose a novel Data Subdivision Based Dual-Weighted Robust Principal Component Analysis, namely DRPCA, which firstly designs a mark vector to distinguish normal samples and outliers, and directly removes outliers according to mark weights. Moreover, we further divide normal samples into positive samples and hard samples by self-constrained weights, and place them in relative positions, so that the weight of positive samples is larger than hard samples, which makes the projection matrix more accurate. Additionally, the optimal mean is employed to obtain a more accurate data center. To solve this problem, we carefully design an effective iterative algorithm and analyze its convergence. Experiments on real-world and RGB large-scale datasets demonstrate the superiority of our method in dimensionality reduction and anomaly detection.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1271-1284"},"PeriodicalIF":0.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143385355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junye Chen;Chaowei Fang;Jichang Li;Yicheng Leng;Guanbin Li
{"title":"Decouple and Couple: Exploiting Prior Knowledge for Visible Video Watermark Removal","authors":"Junye Chen;Chaowei Fang;Jichang Li;Yicheng Leng;Guanbin Li","doi":"10.1109/TIP.2025.3534033","DOIUrl":"10.1109/TIP.2025.3534033","url":null,"abstract":"This paper aims to restore original background images in watermarked videos, overcoming challenges posed by traditional approaches that fail to handle the temporal dynamics and diverse watermark characteristics effectively. Our method introduces a unique framework that first “decouples” the extraction of prior knowledge—such as common-sense knowledge and residual background details—from the temporal modeling process, allowing for independent handling of background restoration and temporal consistency. Subsequently, it “couples” these extracted features by integrating them into the temporal modeling backbone of a video inpainting (VI) framework. This integration is facilitated by a specialized module, which includes an intrinsic background image prediction sub-module and a dual-branch frame embedding module, designed to reduce watermark interference and enhance the application of prior knowledge. Moreover, a frame-adaptive feature selection module dynamically adjusts the extraction of prior features based on the corruption level of each frame, ensuring their effective incorporation into the temporal processing. Extensive experiments on YouTube-VOS and DAVIS datasets validate our method’s efficiency in watermark removal and background restoration, showing significant improvement over state-of-the-art techniques in visible image watermark removal, video restoration, and video inpainting.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1192-1203"},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}