IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献_第7页

Revisiting Deformable Convolution on Graphs: Large-range Modeling and Robustness 图上的可变形卷积：大范围建模和鲁棒性

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-17 DOI: 10.1109/tpami.2025.3611386

Ziyan Zhang, Bo Jiang, Jin Tang, Bin Luo

引用次数: 0

Defenses in Adversarial Machine Learning: a Systematic Survey from the Lifecycle Perspective 对抗性机器学习中的防御：从生命周期角度的系统调查

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-17 DOI: 10.1109/tpami.2025.3611340

Baoyuan Wu, Mingli Zhu, Meixi Zheng, Zihao Zhu, Shaokui Wei, Mingda Zhang, Hongrui Chen, Danni Yuan, Li Liu, Qingshan Liu

引用次数: 0

ACLI: A CNN Pruning Framework Leveraging Adjacent Convolutional Layer Interdependence and $gamma$-Weakly Submodularity. ACLI：利用相邻卷积层相互依赖和$gamma$-弱子模块化的CNN剪枝框架。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610113

S Tofigh,M Askarizadeh,M Omair Ahmad,M N S Swamy,K K Nguyen

{"title":"ACLI: A CNN Pruning Framework Leveraging Adjacent Convolutional Layer Interdependence and $gamma$-Weakly Submodularity.","authors":"S Tofigh,M Askarizadeh,M Omair Ahmad,M N S Swamy,K K Nguyen","doi":"10.1109/tpami.2025.3610113","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610113","url":null,"abstract":"Today, convolutional neural network (CNN) pruning techniques often rely on manually crafted importance criteria and pruning structures. Due to their heuristic nature, these methods may lack generality, and their performance is not guaranteed. In this paper, we propose a theoretical framework to address this challenge by leveraging the concept of $gamma$-weak submodularity, based on a new efficient importance function. By deriving an upper bound on the absolute error in the layer subsequent to the pruned layer, we formulate the importance function as a $gamma$-weakly submodular function. This formulation enables the development of an easy-to-implement, low-complexity, and data-free oblivious algorithm for selecting filters to be removed from a convolutional layer. Extensive experiments show that our method outperforms state-of-the-art benchmark networks across various datasets, with a computational cost comparable to the simplest pruning techniques, such as $l_{2}$-norm pruning. Notably, the proposed method achieves an accuracy of 76.52%, compared to 75.15% for the overall best baseline, with a 25.5% reduction in network parameters. According to our proposed resource-efficiency metric for pruning methods, the ACLI approach demonstrates orders-of-magnitude higher efficiency than the other baselines, while maintaining competitive accuracy.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"3 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights. 重新审视可转移的对抗图像：系统化、评估和新见解。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610085

Zhengyu Zhao,Hanwei Zhang,Renjue Li,Ronan Sicre,Laurent Amsaleg,Michael Backes,Qi Li,Qian Wang,Chao Shen

{"title":"Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights.","authors":"Zhengyu Zhao,Hanwei Zhang,Renjue Li,Ronan Sicre,Laurent Amsaleg,Michael Backes,Qi Li,Qian Wang,Chao Shen","doi":"10.1109/tpami.2025.3610085","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610085","url":null,"abstract":"Transferable adversarial images raise critical security concerns for computer vision systems in real-world, blackbox attack scenarios. Although many transfer attacks have been proposed, existing research lacks a systematic and comprehensive evaluation. In this paper, we systemize transfer attacks into five categories around the general machine learning pipeline and provide the first comprehensive evaluation, with 23 representative attacks against 11 representative defenses, including the recent, transfer-oriented defense and the real-world Google Cloud Vision. In particular, we identify two main problems of existing evaluations: (1) for attack transferability, lack of intra-category analyses with fair hyperparameter settings, and (2) for attack stealthiness, lack of diverse measures. Our evaluation results validate that these problems have indeed caused misleading conclusions and missing points, and addressing them leads to new, consensuschallenging insights, such as (1) an early attack, DI, even outperforms all similar follow-up ones, (2) the state-of-the-art (whitebox) defense, DiffPure, is even vulnerable to (black-box) transfer attacks, and (3) even under the same Lp constraint, different attacks yield dramatically different stealthiness results regarding diverse imperceptibility metrics, finer-grained measures, and a user study. We hope that our analyses will serve as guidance on properly evaluating transferable adversarial images and advance the design of attacks and defenses.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"14 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task-Distributionally Robust Data-Free Meta-Learning 任务分布鲁棒无数据元学习

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3609625

Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao

引用次数: 0

Step-wise Distribution-aligned Style Prompt Tuning for Source-Free Cross-domain Few-shot Learning. 无源跨域少镜头学习的分步分布对齐风格提示调优。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610039

Huali Xu,Li Liu,Tianpeng Liu,Shuaifeng Zhi,Shuzhou Sun,Ming-Ming Cheng

{"title":"Step-wise Distribution-aligned Style Prompt Tuning for Source-Free Cross-domain Few-shot Learning.","authors":"Huali Xu,Li Liu,Tianpeng Liu,Shuaifeng Zhi,Shuzhou Sun,Ming-Ming Cheng","doi":"10.1109/tpami.2025.3610039","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610039","url":null,"abstract":"Existing cross-domain few-shot learning (CDFSL) methods, which develop training strategies in the source domain to enhance model transferability, face challenges when applied to large-scale pre-trained models (LMs), as their source domains and training strategies are not accessible. Besides, fine-tuning LMs specifically for CDFSL requires substantial computational resources, which limits their practicality. Therefore, this paper investigates the source-free CDFSL (SF-CDFSL) problem to solve the few-shot learning (FSL) task in target domain using only a pre-trained model and a few target samples, without requiring source data or training strategies. However, the inaccessibility of source data prevents explicitly reducing the domain gaps between the source and target. To tackle this challenge, this paper proposes a novel approach, Step-wise Distribution-aligned Style Prompt Tuning (StepSPT), to implicitly narrow the domain gaps from the perspective of prediction distribution optimization. StepSPT initially proposes a style prompt that adjusts the target samples to mirror the expected distribution. Furthermore, StepSPT tunes the style prompt and classifier by exploring a dual-phase optimization process (external and internal processes). In the external process, a step-wise distribution alignment strategy is introduced to tune the proposed style prompt by factorizing the prediction distribution optimization problem into the multi-step distribution alignment problem. In the internal process, the classifier is updated via standard cross-entropy loss. Evaluation on 5 datasets illustrates the superiority of StepSPT over existing prompt tuning-based methods and state-of-the-art methods (SOTAs). Furthermore, ablation studies and performance analyzes highlight the efficacy of StepSPT. The code will be made public at https://github.com/xuhuali-mxj/StepSPT.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"71 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I&S-ViT: An Inclusive & Stable Method for Post-Training ViTs Quantization. I&S-ViT：一种包容稳定的培训后vit量化方法。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610466

Yunshan Zhong,Jiawei Hu,Mingbao Lin,Mengzhao Chen,Rongrong Ji

{"title":"I&S-ViT: An Inclusive & Stable Method for Post-Training ViTs Quantization.","authors":"Yunshan Zhong,Jiawei Hu,Mingbao Lin,Mengzhao Chen,Rongrong Ji","doi":"10.1109/tpami.2025.3610466","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610466","url":null,"abstract":"Albeit the scalable performance of vision transformers (ViTs), the dense computational costs undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT's superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of W3A3 ViT-B by an impressive 50.68%. Our code is available at https://github.com/zysxmu/IaS-ViT.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"37 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D Hand Pose Estimation via Articulated Anchor-to-Joint 3D Local Regressors. 基于关节锚点-关节三维局部回归量的三维手部姿态估计。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3609907

Changlong Jiang,Yang Xiao,Jinghong Zheng,Haohong Kuang,Cunlin Wu,Mingyang Zhang,Zhiguo Cao,Min Du,Joey Tianyi Zhou,Junsong Yuan

{"title":"3D Hand Pose Estimation via Articulated Anchor-to-Joint 3D Local Regressors.","authors":"Changlong Jiang,Yang Xiao,Jinghong Zheng,Haohong Kuang,Cunlin Wu,Mingyang Zhang,Zhiguo Cao,Min Du,Joey Tianyi Zhou,Junsong Yuan","doi":"10.1109/tpami.2025.3609907","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609907","url":null,"abstract":"In this paper, we propose to address monocular 3D hand pose estimation from a single RGB or depth image via articulated anchor-to-joint 3D local regressors, in form of A2J-Transformer+. The key idea is to make the local regressors (i.e., anchor points) in 3D space be aware of hand's local fine details and global articulated context jointly, to facilitate predicting their 3D offsets toward hand joints with linear weighted aggregation for joint localization. Our intuition is that, local fine details help to estimate accurate offset but may suffer from the issues including serious occlusion, confusing similar patterns, and overfitting risk. On the other hand, hand's global articulated context can essentially provide additional descriptive clues and constraints to alleviate these issues. To set anchor points adaptively in 3D space, A2J-Transformer+ runs in a 2-stage manner. At the first stage, since the input modality property anchor points distribute more densely on X-Y plane, it leads to lower prediction accuracy along Z direction compared with those in the X and Y directions. To alleviate this, at the second stage anchor points are set near the joints yielded by the first stage evenly along X, Y, and Z directions. This treatment brings two main advantages: (1) balancing the prediction accuracy along X, Y, and Z directions, and (2) ensuring the anchor-joint offsets are of small values relatively easy to estimate. Wide-range experiments on three RGB hand datasets (InterHand2.6M, HO-3D V2 and RHP) and three depth hand datasets (NYU, ICVL and HANDS 2017) verify A2J-Transformer+'s superiority and generalization ability for different modalities (i.e., RGB and depth) and hand cases (i.e., single hand, interacting hands, and hand-object interaction), even outperforming model-based manners. The test on ITOP dataset reveals that, A2J-Transformer+ can also be applied to 3D human pose estimation task. The source code and supporting material will be released upon acceptance.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"84 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach. 面向尺寸不变的显著目标检测：一种通用的评估和优化方法。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3609882

Shilong Bao,Qianqian Xu,Feiran Li,Boyu Han,Zhiyong Yang,Xiaochun Cao,Qingming Huang

{"title":"Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach.","authors":"Shilong Bao,Qianqian Xu,Feiran Li,Boyu Han,Zhiyong Yang,Xiaochun Cao,Qingming Huang","doi":"10.1109/tpami.2025.3609882","DOIUrl":"https://doi.org/10.1109/tpami.2025.3609882","url":null,"abstract":"This paper investigates a fundamental yet underexplored issue in Salient Object Detection (SOD): the size-invariant property for evaluation protocols, particularly in scenarios when multiple salient objects of significantly different sizes appear within a single image. We first present a novel perspective to expose the inherent size sensitivity of existing widely used SOD metrics. Through careful theoretical derivations, we show that the evaluation outcome of an image under current SOD metrics can be essentially decomposed into a sum of several separable terms, with the contribution of each term being directly proportional to its corresponding region size. Consequently, the prediction errors would be dominated by the larger regions, while smaller yet potentially more semantically important objects are often overlooked, leading to biased performance assessments and practical degradation. To address this challenge, a generic Size-Invariant Evaluation (SIEva) framework is proposed. The core idea is to evaluate each separable component individually and then aggregate the results, thereby effectively mitigating the impact of size imbalance across objects. Building upon this, we further develop a dedicated optimization framework (SIOpt), which adheres to the size-invariant principle and significantly enhances the detection of salient objects across a broad range of sizes. Notably, SIOpt is model-agnostic and can be seamlessly integrated with a wide range of SOD backbones. Theoretically, we also present generalization analysis of SOD methods and provide evidence supporting the validity of our new evaluation protocols. Finally, comprehensive experiments speak to the efficacy of our proposed approach.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"64 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative Causality-driven Network for Graph Multi-task Learning. 图多任务学习的生成因果驱动网络。

IF 23.6 1区计算机科学

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-16 DOI: 10.1109/tpami.2025.3610096

Xixun Lin,Qing Yu,Yanan Cao,Lixin Zou,Chuan Zhou,Jia Wu,Chenliang Li,Peng Zhang,Shirui Pan

{"title":"Generative Causality-driven Network for Graph Multi-task Learning.","authors":"Xixun Lin,Qing Yu,Yanan Cao,Lixin Zou,Chuan Zhou,Jia Wu,Chenliang Li,Peng Zhang,Shirui Pan","doi":"10.1109/tpami.2025.3610096","DOIUrl":"https://doi.org/10.1109/tpami.2025.3610096","url":null,"abstract":"Multi-task learning (MTL) is a standard learning paradigm in machine learning. The central idea of MTL is to capture the shared knowledge among multiple tasks for mitigating the problem of data sparsity where the annotated samples for each task are quite limited. Recent studies indicate that graph multi-task learning (GMTL) yields the promising improvement over previous MTL methods. GMTL represents tasks on a task relation graph, and further leverages graph neural networks (GNNs) to learn complex task relationships. Although GMTL achieves the better performance, the construction of task relation graph heavily depends on simple heuristic tricks, which results in the existence of spurious task correlations and the absence of true edges between tasks with strong connections. This problem largely limits the effectiveness of GMTL. To this end, we propose the Generative Causality-driven Network (GCNet), a novel framework that progressively learns the causal structure between tasks to discover which tasks are beneficial to be jointly trained for improving generalization ability and model robustness. To be specific, in the feature space, GCNet first introduces a feature-level generator to generate the structure prior for reducing learning difficulty. Afterwards, GCNet develops a output-level generator which is parameterized as a new causal energy-based model (EBM) to refine the learned structure prior in the output space driven by causality. Benefiting from our proposed causal framework, we theoretically derive an intervention contrastive estimation for training this causal EBM efficiently. Experiments are conducted on multiple synthetic and real-world datasets. Extensive empirical results and model analyses demonstrate the superior performance of GCNet over several competitive MTL baselines.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"24 1 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0