IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Exploiting Ground Depth Estimation for Mobile Monocular 3D Object Detection 基于地面深度估计的移动单目三维目标检测
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-15 DOI: 10.1109/TPAMI.2025.3529084
Yunsong Zhou;Quan Liu;Hongzi Zhu;Yunzhe Li;Shan Chang;Minyi Guo
{"title":"Exploiting Ground Depth Estimation for Mobile Monocular 3D Object Detection","authors":"Yunsong Zhou;Quan Liu;Hongzi Zhu;Yunzhe Li;Shan Chang;Minyi Guo","doi":"10.1109/TPAMI.2025.3529084","DOIUrl":"10.1109/TPAMI.2025.3529084","url":null,"abstract":"Detecting 3D objects from a monocular camera in mobile applications, such as on a vehicle, drone, or robot, is a crucial but challenging task. The monocular vision’s <italic>near-far disparity</i> and the camera’s constantly changing position make it difficult to achieve high accuracy, especially for distant objects. In this paper, we propose a new Mono3D framework named <italic>MoGDE</i>, which takes inspiration from the observation that an object’s depth can be inferred from the ground’s depth underneath it. MoGDE estimates the corresponding ground depth of an image and utilizes this information to guide Mono3D. We use a pose detection network to estimate the camera’s orientation and construct a feature map that represents pixel-level ground depth based on the 3D-to-2D perspective geometry. To further improve Mono3D with the estimated ground depth, we design an RGB-D feature fusion network based on transformer architecture. The long-range self-attention mechanism is utilized to identify ground-contacting points and pin the corresponding ground depth to the image feature map. We evaluate MoGDE on the KITTI dataset, and the results show that it significantly improves the accuracy and robustness of Mono3D for both near and far objects. MoGDE outperforms state-of-the-art methods and ranks first among the pure image-based methods on the KITTI 3D benchmark.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3079-3093"},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142986192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-Domain Generalization VimTS:用于增强跨域通用性的统一视频和图像文本观测器
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3528950
Yuliang Liu;Mingxin Huang;Hao Yan;Linger Deng;Weijia Wu;Hao Lu;Chunhua Shen;Lianwen Jin;Xiang Bai
{"title":"VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-Domain Generalization","authors":"Yuliang Liu;Mingxin Huang;Hao Yan;Linger Deng;Weijia Wu;Hao Lu;Chunhua Shen;Lianwen Jin;Xiang Bai","doi":"10.1109/TPAMI.2025.3528950","DOIUrl":"10.1109/TPAMI.2025.3528950","url":null,"abstract":"Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters. The Prompt Queries Generation Module facilitates explicit interaction between different tasks, while the Tasks-aware Adapter helps the model dynamically learn suitable features for each task. Additionally, to further enable the model to learn temporal information at a lower cost, we propose a synthetic video text dataset (VTD-368 k) by leveraging the Content Deformation Fields (CoDeF) algorithm. Notably, our method outperforms the state-of-the-art method by an average of 2.6% in six cross-domain benchmarks such as TT-to-IC15, CTW1500-to-TT, and TT-to-CTW1500. For video-level cross-domain adaption, our method even surpasses the previous end-to-end video spotting method in ICDAR2015 video and DSText v2 by an average of 5.5% on the MOTA metric, using only image-level data. We further demonstrate that existing Large Multimodal Models exhibit limitations in generating cross-domain scene text spotting, in contrast to our VimTS model which requires significantly fewer parameters and data.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2957-2972"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few-Shot Class-Incremental Learning for Classification and Object Detection: A Survey 用于分类和物体检测的少镜头分类增量学习:调查
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3529038
Jinghua Zhang;Li Liu;Olli Silvén;Matti Pietikäinen;Dewen Hu
{"title":"Few-Shot Class-Incremental Learning for Classification and Object Detection: A Survey","authors":"Jinghua Zhang;Li Liu;Olli Silvén;Matti Pietikäinen;Dewen Hu","doi":"10.1109/TPAMI.2025.3529038","DOIUrl":"10.1109/TPAMI.2025.3529038","url":null,"abstract":"Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in Machine Learning (ML), as it necessitates the Incremental Learning (IL) of new classes from sparsely labeled training samples without forgetting previous knowledge. While this field has seen recent progress, it remains an active exploration area. This paper aims to provide a comprehensive and systematic review of FSCIL. In our in-depth examination, we delve into various facets of FSCIL, encompassing the problem definition, the discussion of the primary challenges of unreliable empirical risk minimization and the stability-plasticity dilemma, general schemes, and relevant problems of IL and Few-shot Learning (FSL). Besides, we offer an overview of benchmark datasets and evaluation metrics. Furthermore, we introduce the Few-shot Class-incremental Classification (FSCIC) methods from data-based, structure-based, and optimization-based approaches and the Few-shot Class-incremental Object Detection (FSCIOD) methods from anchor-free and anchor-based approaches. Beyond these, we present several promising research directions within FSCIL that merit further investigation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2924-2945"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning the Optimal Discriminant SVM With Feature Extraction 基于特征提取的最优判别支持向量机学习
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3529711
Junhong Zhang;Zhihui Lai;Heng Kong;Jian Yang
{"title":"Learning the Optimal Discriminant SVM With Feature Extraction","authors":"Junhong Zhang;Zhihui Lai;Heng Kong;Jian Yang","doi":"10.1109/TPAMI.2025.3529711","DOIUrl":"10.1109/TPAMI.2025.3529711","url":null,"abstract":"Subspace learning and Support Vector Machine (SVM) are two critical techniques in pattern recognition, playing pivotal roles in feature extraction and classification. However, how to learn the optimal subspace such that the SVM classifier can perform the best is still a challenging problem due to the difficulty in optimization, computation, and algorithm convergence. To address these problems, this paper develops a novel method named Optimal Discriminant Support Vector Machine (ODSVM), which integrates support vector classification with discriminative subspace learning in a seamless framework. As a result, the most discriminative subspace and the corresponding optimal SVM are obtained simultaneously to pursue the best classification performance. The efficient optimization framework is designed for binary and multi-class ODSVM. Moreover, a fast sequential minimization optimization (SMO) algorithm with pruning is proposed to accelerate the computation in multi-class ODSVM. Unlike other related methods, ODSVM has a strong theoretical guarantee of global convergence, highlighting its superiority and stability. Numerical experiments are conducted on thirteen datasets and the results demonstrate that ODSVM outperforms existing methods with statistical significance.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2897-2911"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Condition-Invariant Semantic Segmentation 条件不变语义分割
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3529350
Christos Sakaridis;David Bruggemann;Fisher Yu;Luc Van Gool
{"title":"Condition-Invariant Semantic Segmentation","authors":"Christos Sakaridis;David Bruggemann;Fisher Yu;Luc Van Gool","doi":"10.1109/TPAMI.2025.3529350","DOIUrl":"10.1109/TPAMI.2025.3529350","url":null,"abstract":"Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes <inline-formula><tex-math>$to$</tex-math></inline-formula> Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes <inline-formula><tex-math>$to$</tex-math></inline-formula> ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3111-3125"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clarify Confused Nodes via Separated Learning 通过分离学习澄清困惑的节点
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3528738
Jiajun Zhou;Shengbo Gong;Xuanze Chen;Chenxuan Xie;Shanqing Yu;Qi Xuan;Xiaoniu Yang
{"title":"Clarify Confused Nodes via Separated Learning","authors":"Jiajun Zhou;Shengbo Gong;Xuanze Chen;Chenxuan Xie;Shanqing Yu;Qi Xuan;Xiaoniu Yang","doi":"10.1109/TPAMI.2025.3528738","DOIUrl":"10.1109/TPAMI.2025.3528738","url":null,"abstract":"Graph neural networks (GNNs) have achieved remarkable advances in graph-oriented tasks. However, real-world graphs invariably contain a certain proportion of heterophilous nodes, challenging the homophily assumption of traditional GNNs and hindering their performance. Most existing studies continue to design generic models with shared weights between heterophilous and homophilous nodes. Despite the incorporation of high-order messages or multi-channel architectures, these efforts often fall short. A minority of studies attempt to train different node groups separately but suffer from inappropriate separation metrics and low efficiency. In this paper, we first propose a new metric, termed Neighborhood Confusion (<italic>NC</i>), to facilitate a more reliable separation of nodes. We observe that node groups with different levels of <italic>NC</i> values exhibit certain differences in intra-group accuracy and visualized embeddings. These pave the way for <bold>N</b>eighborhood <bold>C</b>onfusion-guided <bold>G</b>raph <bold>C</b>onvolutional <bold>N</b>etwork (<bold>NCGCN</b>), in which nodes are grouped by their <italic>NC</i> values and accept intra-group weight sharing and message passing. Extensive experiments on both homophilous and heterophilous benchmarks demonstrate that our framework can effectively separate nodes and yield significant performance improvement compared to the latest methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2882-2896"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Biometric Verification With Handwritten Random Digit String 手写随机数字字符串保护隐私的生物特征验证
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-14 DOI: 10.1109/TPAMI.2025.3529022
Peirong Zhang;Yuliang Liu;Songxuan Lai;Hongliang Li;Lianwen Jin
{"title":"Privacy-Preserving Biometric Verification With Handwritten Random Digit String","authors":"Peirong Zhang;Yuliang Liu;Songxuan Lai;Hongliang Li;Lianwen Jin","doi":"10.1109/TPAMI.2025.3529022","DOIUrl":"10.1109/TPAMI.2025.3529022","url":null,"abstract":"Handwriting verification has stood as a steadfast identity authentication method for decades. However, this technique risks potential privacy breaches due to the inclusion of personal information in handwritten biometrics such as signatures. To address this concern, we propose using the Random Digit String (RDS) for privacy-preserving handwriting verification. This approach allows users to authenticate themselves by writing an arbitrary digit sequence, effectively ensuring privacy protection. To evaluate the effectiveness of RDS, we construct a new HRDS4BV dataset composed of online naturally handwritten RDS. Unlike conventional handwriting, RDS encompasses unconstrained and variable content, posing significant challenges for modeling consistent personal writing style. To surmount this, we propose the Pattern Attentive VErification Network (PAVENet), along with a Discriminative Pattern Mining (DPM) module. DPM adaptively enhances the recognition of consistent and discriminative writing patterns, thus refining handwriting style representation. Through comprehensive evaluations, we scrutinize the applicability of online RDS verification and showcase a pronounced outperformance of our model over existing methods. Furthermore, we discover a noteworthy forgery phenomenon that deviates from prior findings and discuss its positive impact in countering malicious impostor attacks. Substantially, our work underscores the feasibility of privacy-preserving biometric verification and propels the prospects of its broader acceptance and application.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3049-3066"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Torsion Graph Neural Networks 扭转图神经网络
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-13 DOI: 10.1109/TPAMI.2025.3528449
Cong Shen;Xiang Liu;Jiawei Luo;Kelin Xia
{"title":"Torsion Graph Neural Networks","authors":"Cong Shen;Xiang Liu;Jiawei Luo;Kelin Xia","doi":"10.1109/TPAMI.2025.3528449","DOIUrl":"10.1109/TPAMI.2025.3528449","url":null,"abstract":"Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from four types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2946-2956"},"PeriodicalIF":0.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding 视频数据飞轮:解决视频语言理解中不可能的数据三位一体
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-13 DOI: 10.1109/TPAMI.2025.3528394
Xiao Wang;Jianlong Wu;Zijia Lin;Fuzheng Zhang;Di Zhang;Liqiang Nie
{"title":"Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding","authors":"Xiao Wang;Jianlong Wu;Zijia Lin;Fuzheng Zhang;Di Zhang;Liqiang Nie","doi":"10.1109/TPAMI.2025.3528394","DOIUrl":"10.1109/TPAMI.2025.3528394","url":null,"abstract":"Recently, video-language understanding has achieved great success through large-scale pre-training. However, data scarcity remains a prevailing challenge. This study quantitatively reveals an “impossible trinity” among data quantity, diversity, and quality in pre-training datasets. Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations. These methods successfully refine the original annotations by leveraging useful information in multimodal video content (frames, tags, ASR transcripts, etc.). Nevertheless, they struggle to mitigate noise within synthetic annotations and lack scalability as the dataset size expands. To address these issues, we introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods. For iterative refinement, we first leverage a video-language model to generate synthetic annotations, resulting in a refined dataset. Then, we pre-train on it and fine-tune on human refinement examples for a stronger model. These processes are repeated for continuous improvement. For noise control, we present AdaTaiLr, a novel method that requires weaker assumptions on noise distribution. This method proves more effective in large datasets and offers theoretical guarantees. The combination of iterative refinement and AdaTaiLr can achieve better scalability in video-language understanding. Extensive experiments show that our framework outperforms existing data refinement baselines, delivering a 3% performance boost and improving dataset quality with minimal diversity loss. Furthermore, our refined dataset facilitates significant improvements in various video-language understanding tasks, including video question answering and text-video retrieval.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2912-2923"},"PeriodicalIF":0.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction 基于误差减小的视觉变换训练后精确量化研究
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-13 DOI: 10.1109/TPAMI.2025.3528042
Yunshan Zhong;You Huang;Jiawei Hu;Yuxin Zhang;Rongrong Ji
{"title":"Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction","authors":"Yunshan Zhong;You Huang;Jiawei Hu;Yuxin Zhang;Rongrong Ji","doi":"10.1109/TPAMI.2025.3528042","DOIUrl":"10.1109/TPAMI.2025.3528042","url":null,"abstract":"Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially. The first step, Activation quantization error reduction (Aqer), first applies Reparameterization Initialization aimed at mitigating initial quantization errors in high-variance activations. Then, it further mitigates the errors by formulating a Ridge Regression problem, which updates the weights maintained at full-precision using a closed-form solution. The second step, Weight quantization error reduction (Wqer), first applies Dual Uniform Quantization to handle weights with numerous outliers, which arise from adjustments made during Reparameterization Initialization, thereby reducing initial weight quantization errors. Then, it employs an iterative approach to further tackle the errors. In each iteration, it adopts Rounding Refinement that uses an empirically derived, efficient proxy to refine the rounding directions of quantized weights, complemented by a Ridge Regression solver to reduce the errors. Comprehensive experimental results demonstrate ERQ’s superior performance across various ViTs variants and tasks. For example, ERQ surpasses the state-of-the-art GPTQ by a notable 36.81% in accuracy for W3A4 ViT-S.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2676-2692"},"PeriodicalIF":0.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信