Qingyuan Wang , Barry Cardiff , Antoine Frappé , Benoit Larras , Deepu John
{"title":"DyCE:动态配置退出深度学习压缩和实时缩放","authors":"Qingyuan Wang , Barry Cardiff , Antoine Frappé , Benoit Larras , Deepu John","doi":"10.1016/j.future.2025.107837","DOIUrl":null,"url":null,"abstract":"<div><div>Conventional deep learning (DL) model compression methods affect all input samples equally. However, as samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic techniques are typically monolithic and have model-specific implementations, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, and unable to adjust once deployed. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without needing re-initialization or re-deployment. DyCE achieves this by adding exit networks to intermediate layers, thus allowing early termination if results are acceptable. DyCE also decouples the design of exit networks from the base model itself, enabling its easy adaptation to new base models. We also propose methods for generating optimized configurations and determining exit network types and positions for dynamic trade-offs. By enabling simple configuration switching, DyCE enables fine-grained performance-complexity tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 26.2% for ResNet<span><math><msub><mrow></mrow><mrow><mi>152</mi></mrow></msub></math></span>, 26.6% for ConvNextv2<span><math><msub><mrow></mrow><mrow><mi>tiny</mi></mrow></msub></math></span> and 32.0% for DaViT<span><math><msub><mrow></mrow><mrow><mi>base</mi></mrow></msub></math></span> on ImageNet validation set, with accuracy reductions of less than 0.5%.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"171 ","pages":"Article 107837"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DyCE: Dynamically Configurable Exiting for deep learning compression and real-time scaling\",\"authors\":\"Qingyuan Wang , Barry Cardiff , Antoine Frappé , Benoit Larras , Deepu John\",\"doi\":\"10.1016/j.future.2025.107837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Conventional deep learning (DL) model compression methods affect all input samples equally. However, as samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic techniques are typically monolithic and have model-specific implementations, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, and unable to adjust once deployed. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without needing re-initialization or re-deployment. DyCE achieves this by adding exit networks to intermediate layers, thus allowing early termination if results are acceptable. DyCE also decouples the design of exit networks from the base model itself, enabling its easy adaptation to new base models. We also propose methods for generating optimized configurations and determining exit network types and positions for dynamic trade-offs. By enabling simple configuration switching, DyCE enables fine-grained performance-complexity tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 26.2% for ResNet<span><math><msub><mrow></mrow><mrow><mi>152</mi></mrow></msub></math></span>, 26.6% for ConvNextv2<span><math><msub><mrow></mrow><mrow><mi>tiny</mi></mrow></msub></math></span> and 32.0% for DaViT<span><math><msub><mrow></mrow><mrow><mi>base</mi></mrow></msub></math></span> on ImageNet validation set, with accuracy reductions of less than 0.5%.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"171 \",\"pages\":\"Article 107837\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25001323\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001323","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
DyCE: Dynamically Configurable Exiting for deep learning compression and real-time scaling
Conventional deep learning (DL) model compression methods affect all input samples equally. However, as samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic techniques are typically monolithic and have model-specific implementations, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, and unable to adjust once deployed. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without needing re-initialization or re-deployment. DyCE achieves this by adding exit networks to intermediate layers, thus allowing early termination if results are acceptable. DyCE also decouples the design of exit networks from the base model itself, enabling its easy adaptation to new base models. We also propose methods for generating optimized configurations and determining exit network types and positions for dynamic trade-offs. By enabling simple configuration switching, DyCE enables fine-grained performance-complexity tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 26.2% for ResNet, 26.6% for ConvNextv2 and 32.0% for DaViT on ImageNet validation set, with accuracy reductions of less than 0.5%.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.