{"title":"$\\text{Edge}^{n}$ AI: Distributed Inference with Local Edge Devices and Minimal Latency","authors":"Maedeh Hemmat, A. Davoodi, Y. Hu","doi":"10.1109/ASP-DAC52403.2022.9712496","DOIUrl":null,"url":null,"abstract":"We propose $\\text{Edge}^{n}$ AI, a framework to decompose a complex deep neural networks (DNN) over $n$ available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is applied to aggressively reduce the size of the SNN on each edge device. The SNNs perform inference in parallel, and are configured to generate a ‘Don't Know’ response when an unassigned class is identified. Our experiments show up to 17X inference speedup compared to a recent work, on devices of at most 150 MB memory when distributing a variant of VGG-16 over 20 parallel edge devices.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC52403.2022.9712496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We propose $\text{Edge}^{n}$ AI, a framework to decompose a complex deep neural networks (DNN) over $n$ available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is applied to aggressively reduce the size of the SNN on each edge device. The SNNs perform inference in parallel, and are configured to generate a ‘Don't Know’ response when an unassigned class is identified. Our experiments show up to 17X inference speedup compared to a recent work, on devices of at most 150 MB memory when distributing a variant of VGG-16 over 20 parallel edge devices.