{"title":"BHPVAS: visual analysis system for pruning attention heads in BERT model","authors":"Zhen Liu, Haibo Sun, Huawei Sun, Xinyu Hong, Gang Xu, Xiangyang Wu","doi":"10.1007/s12650-024-00985-z","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>In the field of deep learning, pre-trained BERT models have achieved remarkable success. However, the accompanying problem is that models with more complex structures and more network parameters. The huge parameter size makes the computational cost in terms of time and memory become extremely expensive. Recent work has indicated that BERT models own a significant amount of redundant attention heads. Meanwhile considerable BERT models compression algorithms have been proposed, which can effectively reduce model complexity and redundancy with pruning some attention heads. Nevertheless, existing automated model compression solutions are mainly based on predetermined pruning program, which requires multiple expensive pruning-retraining cycles or heuristic designs to select additional hyperparameters. Furthermore, the training process of BERT models is a black box, and lacks interpretability, which makes researchers cannot intuitively understand the optimization process of the model. In this paper, we propose a visual analysis system, BHPVAS, for pruning BERT models, which helps researchers to incorporate their understanding of model structure and operating mechanism into the model pruning process and generate pruning schemes. We propose three pruning criteria based on the attention data, namely, importance score, stability score, and similarity score, for evaluating the importance of self-attention heads. Additionally, we design multiple collaborative views to display the entire pruning process, guiding users to carry out pruning. Our system supports exploring the role of self-attention heads in the model inference process using text dependency relations and attention weight distribution. Finally, we conduct two case studies to demonstrate how to use the system for Sentiment Classification Sample Analysis and Pruning Scheme Exploration, verifying the effectiveness of the visual analysis system.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>","PeriodicalId":54756,"journal":{"name":"Journal of Visualization","volume":"56 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visualization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12650-024-00985-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of deep learning, pre-trained BERT models have achieved remarkable success. However, the accompanying problem is that models with more complex structures and more network parameters. The huge parameter size makes the computational cost in terms of time and memory become extremely expensive. Recent work has indicated that BERT models own a significant amount of redundant attention heads. Meanwhile considerable BERT models compression algorithms have been proposed, which can effectively reduce model complexity and redundancy with pruning some attention heads. Nevertheless, existing automated model compression solutions are mainly based on predetermined pruning program, which requires multiple expensive pruning-retraining cycles or heuristic designs to select additional hyperparameters. Furthermore, the training process of BERT models is a black box, and lacks interpretability, which makes researchers cannot intuitively understand the optimization process of the model. In this paper, we propose a visual analysis system, BHPVAS, for pruning BERT models, which helps researchers to incorporate their understanding of model structure and operating mechanism into the model pruning process and generate pruning schemes. We propose three pruning criteria based on the attention data, namely, importance score, stability score, and similarity score, for evaluating the importance of self-attention heads. Additionally, we design multiple collaborative views to display the entire pruning process, guiding users to carry out pruning. Our system supports exploring the role of self-attention heads in the model inference process using text dependency relations and attention weight distribution. Finally, we conduct two case studies to demonstrate how to use the system for Sentiment Classification Sample Analysis and Pruning Scheme Exploration, verifying the effectiveness of the visual analysis system.
Journal of VisualizationCOMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY
CiteScore
3.40
自引率
5.90%
发文量
79
审稿时长
>12 weeks
期刊介绍:
Visualization is an interdisciplinary imaging science devoted to making the invisible visible through the techniques of experimental visualization and computer-aided visualization.
The scope of the Journal is to provide a place to exchange information on the latest visualization technology and its application by the presentation of latest papers of both researchers and technicians.