Jeet Dutta, Swarnava Dey, Arijit Mukherjee, Arpan Pal
{"title":"边缘设备上深度学习模型自动拟合的加速感知、无需再训练的进化剪枝","authors":"Jeet Dutta, Swarnava Dey, Arijit Mukherjee, Arpan Pal","doi":"10.1145/3564121.3564133","DOIUrl":null,"url":null,"abstract":"Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices\",\"authors\":\"Jeet Dutta, Swarnava Dey, Arijit Mukherjee, Arpan Pal\",\"doi\":\"10.1145/3564121.3564133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.\",\"PeriodicalId\":166150,\"journal\":{\"name\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3564121.3564133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices
Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.