基于层间维数关系建模的视觉变形头内剪枝

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-04 DOI:10.1016/j.neunet.2025.107656

Peng Zhang, Cong Tian, Liang Zhao, Zhenhua Duan

{"title":"基于层间维数关系建模的视觉变形头内剪枝","authors":"Peng Zhang, Cong Tian, Liang Zhao, Zhenhua Duan","doi":"10.1016/j.neunet.2025.107656","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel <strong>i</strong>ntra-<strong>h</strong>ead <strong>p</strong>runing (<strong>IHP</strong>) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107656"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intra-head pruning for vision transformers via inter-layer dimension relationship modeling\",\"authors\":\"Peng Zhang, Cong Tian, Liang Zhao, Zhenhua Duan\",\"doi\":\"10.1016/j.neunet.2025.107656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel <strong>i</strong>ntra-<strong>h</strong>ead <strong>p</strong>runing (<strong>IHP</strong>) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"190 \",\"pages\":\"Article 107656\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025005362\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005362","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

Transformer模型在一系列自然语言处理和计算机视觉任务中表现出了良好的性能。然而，变压器模型所带来的巨大计算成本对其在硬件有限的平台上的实际实现构成了重大障碍。为了应对这一挑战，最近的学术研究集中在头部修剪上，这是一种有效消除变压器模型中不重要组件的策略。虽然这些剪枝方法已经有了很大的改进，但由于剪枝粒度粗大，在丢弃零值成分时没有考虑层与层之间的相互依赖关系，造成了严重的精度损失。这对于实现具有高效压缩的网络架构至关重要。因此，我们提出了一种新的头部内剪枝（IHP）技术来稀疏训练剪枝后的视觉变压器。具体来说，我们的方法利用一个精心设计的可训练行参数来参与模型的稀疏训练。在此基础上，引入了一个关系矩阵作为分组剪枝过程的关键。分组策略保证了冗余组件消除的一致性和一致性，从而保持了剪枝后网络的结构完整性和功能一致性。在基准数据集（CIFAR-10/100、ImageNet-1K）上的实验结果表明，该方法可以显著降低DeiT、Swin Transformer、CCT等主流视觉变压器的计算成本，且精度下降较小。特别是在ILSVRC-12上，在相同的FLOPs降低率为46.20%的情况下，Top-1精度比DeiT-tiny的先进方法提高了0.47%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Intra-head pruning for vision transformers via inter-layer dimension relationship modeling

Transformer models have demonstrated good performance across a range of natural language processing and computer vision tasks. However, the huge computational cost imposed by transformer models poses a significant obstacle to their practical implementation on platforms with limited hardware. To address this challenge, recent academic studies have been focused on head pruning, a strategy that effectively eliminates unimportant components in transformer models. Although these pruning methods have shown significant improvements, they suffer from severe accuracy loss due to coarse pruning granularity and fail to consider the interdependence between layers when discarding zero-valued components. This is crucial for achieving a network architecture with efficient compression. Therefore, we propose a novel intra-head pruning (IHP) technique to sparsely train pruned vision transformers. Specifically, our method utilizes a trainable row parameter delicately designed to participate in the sparse training of the model. Furthermore, we introduce a relationship matrix which serves as the key to the grouping pruning process. The grouping policies ensures consistent and coherent elimination of redundant components, thereby maintaining the structural integrity and functional consistency of the pruned network. Experimental results on benchmark datasets (CIFAR-10/100, ImageNet-1K) show that this method can significantly reduce the computational cost of the mainstream vision transformers such as DeiT, Swin Transformer, and CCT, with a small decrease in accuracy. Especially on ILSVRC-12, under the same FLOPs reduction ratio of 46.20%, the Top-1 accuracy improves by 0.47% compared to advanced methods for DeiT-tiny.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.