Shuo Huai;Hao Kong;Shiqing Li;Xiangzhong Luo;Ravi Subramaniam;Christian Makaya;Qian Lin;Weichen Liu
{"title":"EvoLP:用于实时边缘系统模型压缩的自进化延迟预测器","authors":"Shuo Huai;Hao Kong;Shiqing Li;Xiangzhong Luo;Ravi Subramaniam;Christian Makaya;Qian Lin;Weichen Liu","doi":"10.1109/LES.2023.3321599","DOIUrl":null,"url":null,"abstract":"Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open-source EvoLP at \n<uri>https://github.com/ntuliuteam/EvoLP</uri>\n.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 2","pages":"174-177"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge Systems\",\"authors\":\"Shuo Huai;Hao Kong;Shiqing Li;Xiangzhong Luo;Ravi Subramaniam;Christian Makaya;Qian Lin;Weichen Liu\",\"doi\":\"10.1109/LES.2023.3321599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open-source EvoLP at \\n<uri>https://github.com/ntuliuteam/EvoLP</uri>\\n.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"16 2\",\"pages\":\"174-177\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10269129/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10269129/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge Systems
Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open-source EvoLP at
https://github.com/ntuliuteam/EvoLP
.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.