{"title":"基于层间维数关系建模的视觉变形头内剪枝","authors":"Peng Zhang, Cong Tian, Liang Zhao, Zhenhua Duan","doi":"10.1016/j.neunet.2025.107656","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel <strong>i</strong>ntra-<strong>h</strong>ead <strong>p</strong>runing (<strong>IHP</strong>) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107656"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intra-head pruning for vision transformers via inter-layer dimension relationship modeling\",\"authors\":\"Peng Zhang, Cong Tian, Liang Zhao, Zhenhua Duan\",\"doi\":\"10.1016/j.neunet.2025.107656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel <strong>i</strong>ntra-<strong>h</strong>ead <strong>p</strong>runing (<strong>IHP</strong>) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"190 \",\"pages\":\"Article 107656\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025005362\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005362","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Intra-head pruning for vision transformers via inter-layer dimension relationship modeling
Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel intra-head pruning (IHP) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.