广域拓扑:通过训练动态保存和广域拓扑探索来改进前瞻神经网络剪枝。

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Changjian Deng, Jian Cheng, Yanzhou Su, Zeyu An, Zhiguo Yang, Ziying Xia, Yijie Zhang, Shiguang Wang
{"title":"广域拓扑:通过训练动态保存和广域拓扑探索来改进前瞻神经网络剪枝。","authors":"Changjian Deng,&nbsp;Jian Cheng,&nbsp;Yanzhou Su,&nbsp;Zeyu An,&nbsp;Zhiguo Yang,&nbsp;Ziying Xia,&nbsp;Yijie Zhang,&nbsp;Shiguang Wang","doi":"10.1016/j.neunet.2025.108136","DOIUrl":null,"url":null,"abstract":"<div><div>Foresight neural network pruning methods have garnered significant attention due to their potential to save computational resources. Recent advancements in this field are predominantly categorized into saliency score-based and graph theory-based methods. The former assesses the sensitivity of pruning parameter connections concerning specific metrics, while the latter aims to identify sub-networks characterized by sparse yet highly connected graph structures. However, recent research suggests that relying exclusively on saliency scores may result in deep but narrow sub-networks, while graph theory-based methods may be unsuitable for neural networks requiring pre-trained parameters for initialization, particularly in transfer learning scenarios. We hypothesize that preserving the training dynamics of sub-networks during pruning, along with exploring network structures with wide topology, can facilitate the identification of structurally stable sub-networks with improved post-training performance. Motivated by this, we propose WideTopo, which integrates Neural Tangent Kernel (NTK) theory with Implicit Target Alignment (ITA) in neural networks to capture the training dynamics of sub-networks. Furthermore, it employs a density-aware saliency score decay strategy and a repeated mask restoration strategy to retain more effective nodes, thereby sustaining the width of each layer within the sub-networks. We conducted extensive validations using CNN-based and ViT-based models on representative image classification and semantic segmentation datasets under both random and pre-trained initialization settings. The effectiveness and applicability of our method have been validated on diverse network architectures at various model density rates, showing competitive post-training performance compared with other existing baselines. Our code is publicly available at <span><span>https://github.com/Memoristor/WideTopo</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"194 ","pages":"Article 108136"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WideTopo: Improving foresight neural network pruning through training dynamics preservation and wide topologies exploration\",\"authors\":\"Changjian Deng,&nbsp;Jian Cheng,&nbsp;Yanzhou Su,&nbsp;Zeyu An,&nbsp;Zhiguo Yang,&nbsp;Ziying Xia,&nbsp;Yijie Zhang,&nbsp;Shiguang Wang\",\"doi\":\"10.1016/j.neunet.2025.108136\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Foresight neural network pruning methods have garnered significant attention due to their potential to save computational resources. Recent advancements in this field are predominantly categorized into saliency score-based and graph theory-based methods. The former assesses the sensitivity of pruning parameter connections concerning specific metrics, while the latter aims to identify sub-networks characterized by sparse yet highly connected graph structures. However, recent research suggests that relying exclusively on saliency scores may result in deep but narrow sub-networks, while graph theory-based methods may be unsuitable for neural networks requiring pre-trained parameters for initialization, particularly in transfer learning scenarios. We hypothesize that preserving the training dynamics of sub-networks during pruning, along with exploring network structures with wide topology, can facilitate the identification of structurally stable sub-networks with improved post-training performance. Motivated by this, we propose WideTopo, which integrates Neural Tangent Kernel (NTK) theory with Implicit Target Alignment (ITA) in neural networks to capture the training dynamics of sub-networks. Furthermore, it employs a density-aware saliency score decay strategy and a repeated mask restoration strategy to retain more effective nodes, thereby sustaining the width of each layer within the sub-networks. We conducted extensive validations using CNN-based and ViT-based models on representative image classification and semantic segmentation datasets under both random and pre-trained initialization settings. The effectiveness and applicability of our method have been validated on diverse network architectures at various model density rates, showing competitive post-training performance compared with other existing baselines. Our code is publicly available at <span><span>https://github.com/Memoristor/WideTopo</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"194 \",\"pages\":\"Article 108136\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025010160\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025010160","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

前瞻神经网络修剪方法因其节省计算资源的潜力而受到广泛关注。该领域的最新进展主要分为基于显著性分数的方法和基于图论的方法。前者评估特定指标剪枝参数连接的敏感性,而后者旨在识别具有稀疏但高度连接的图结构特征的子网络。然而,最近的研究表明,完全依赖显著性分数可能会导致深度但狭窄的子网络,而基于图论的方法可能不适合需要预先训练参数进行初始化的神经网络,特别是在迁移学习场景中。我们假设在修剪过程中保留子网络的训练动态,以及探索具有广泛拓扑结构的网络结构,可以促进结构稳定的子网络的识别,并提高训练后的性能。基于此,我们提出了WideTopo,它将神经网络中的神经切线核(NTK)理论与隐式目标对齐(ITA)相结合,以捕获子网络的训练动态。此外,它采用密度感知显著性分数衰减策略和重复掩码恢复策略来保留更有效的节点,从而保持子网络内每层的宽度。我们在随机和预训练的初始化设置下,使用基于cnn和基于viti的模型对代表性图像分类和语义分割数据集进行了广泛的验证。我们的方法的有效性和适用性已经在不同的网络架构上以不同的模型密度率进行了验证,与其他现有基线相比,显示出具有竞争力的训练后性能。我们的代码可以在https://github.com/Memoristor/WideTopo上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
WideTopo: Improving foresight neural network pruning through training dynamics preservation and wide topologies exploration
Foresight neural network pruning methods have garnered significant attention due to their potential to save computational resources. Recent advancements in this field are predominantly categorized into saliency score-based and graph theory-based methods. The former assesses the sensitivity of pruning parameter connections concerning specific metrics, while the latter aims to identify sub-networks characterized by sparse yet highly connected graph structures. However, recent research suggests that relying exclusively on saliency scores may result in deep but narrow sub-networks, while graph theory-based methods may be unsuitable for neural networks requiring pre-trained parameters for initialization, particularly in transfer learning scenarios. We hypothesize that preserving the training dynamics of sub-networks during pruning, along with exploring network structures with wide topology, can facilitate the identification of structurally stable sub-networks with improved post-training performance. Motivated by this, we propose WideTopo, which integrates Neural Tangent Kernel (NTK) theory with Implicit Target Alignment (ITA) in neural networks to capture the training dynamics of sub-networks. Furthermore, it employs a density-aware saliency score decay strategy and a repeated mask restoration strategy to retain more effective nodes, thereby sustaining the width of each layer within the sub-networks. We conducted extensive validations using CNN-based and ViT-based models on representative image classification and semantic segmentation datasets under both random and pre-trained initialization settings. The effectiveness and applicability of our method have been validated on diverse network architectures at various model density rates, showing competitive post-training performance compared with other existing baselines. Our code is publicly available at https://github.com/Memoristor/WideTopo.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信