{"title":"边缘云中基于层冗余的DNN模型库规划","authors":"Hongmin Geng;Yuepeng Li;Sheng Wang;Lin Gu;Deze Zeng","doi":"10.1109/TCC.2025.3591482","DOIUrl":null,"url":null,"abstract":"The booming development of artificial intelligence (AI) applications has greatly promoted edge intelligence technology. To support latency-sensitive Deep Neural Network (DNN) based applications, the integration of serverless inference paradigm into edge intelligence has become a widely recognized solution. However, the long DNN model downloading time from central clouds to edge servers hinders inference performance, and asks for establishing model repository within the edge cloud. This paper first identifies the inherent layer redundancy in DNN models, which is potentially beneficial to improve the storage efficiency of the model repository in the edge cloud. However, how to exploit the layer redundancy feature and allocate the DNN layers across different edge servers with capacitated storage resources to reduce the model downloading time remains challenging. To address this issue, we first formulate this problem in Quadratic Integer Programming (QIP) form, based on which a randomized rounding layer redundancy aware DNN model storage planning strategy is proposed. Our approach significantly reduces model downloading time by up to 63% compared to state-of-the-art methods, as demonstrated through extensive trace-driven experiments.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 3","pages":"1038-1049"},"PeriodicalIF":5.0000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Layer Redundancy Aware DNN Model Repository Planning for Fast Model Download in Edge Cloud\",\"authors\":\"Hongmin Geng;Yuepeng Li;Sheng Wang;Lin Gu;Deze Zeng\",\"doi\":\"10.1109/TCC.2025.3591482\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The booming development of artificial intelligence (AI) applications has greatly promoted edge intelligence technology. To support latency-sensitive Deep Neural Network (DNN) based applications, the integration of serverless inference paradigm into edge intelligence has become a widely recognized solution. However, the long DNN model downloading time from central clouds to edge servers hinders inference performance, and asks for establishing model repository within the edge cloud. This paper first identifies the inherent layer redundancy in DNN models, which is potentially beneficial to improve the storage efficiency of the model repository in the edge cloud. However, how to exploit the layer redundancy feature and allocate the DNN layers across different edge servers with capacitated storage resources to reduce the model downloading time remains challenging. To address this issue, we first formulate this problem in Quadratic Integer Programming (QIP) form, based on which a randomized rounding layer redundancy aware DNN model storage planning strategy is proposed. Our approach significantly reduces model downloading time by up to 63% compared to state-of-the-art methods, as demonstrated through extensive trace-driven experiments.\",\"PeriodicalId\":13202,\"journal\":{\"name\":\"IEEE Transactions on Cloud Computing\",\"volume\":\"13 3\",\"pages\":\"1038-1049\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cloud Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11088223/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11088223/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Layer Redundancy Aware DNN Model Repository Planning for Fast Model Download in Edge Cloud
The booming development of artificial intelligence (AI) applications has greatly promoted edge intelligence technology. To support latency-sensitive Deep Neural Network (DNN) based applications, the integration of serverless inference paradigm into edge intelligence has become a widely recognized solution. However, the long DNN model downloading time from central clouds to edge servers hinders inference performance, and asks for establishing model repository within the edge cloud. This paper first identifies the inherent layer redundancy in DNN models, which is potentially beneficial to improve the storage efficiency of the model repository in the edge cloud. However, how to exploit the layer redundancy feature and allocate the DNN layers across different edge servers with capacitated storage resources to reduce the model downloading time remains challenging. To address this issue, we first formulate this problem in Quadratic Integer Programming (QIP) form, based on which a randomized rounding layer redundancy aware DNN model storage planning strategy is proposed. Our approach significantly reduces model downloading time by up to 63% compared to state-of-the-art methods, as demonstrated through extensive trace-driven experiments.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.