{"title":"$\\text{Edge}^{n}$ AI:基于本地边缘设备和最小延迟的分布式推理","authors":"Maedeh Hemmat, A. Davoodi, Y. Hu","doi":"10.1109/ASP-DAC52403.2022.9712496","DOIUrl":null,"url":null,"abstract":"We propose $\\text{Edge}^{n}$ AI, a framework to decompose a complex deep neural networks (DNN) over $n$ available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is applied to aggressively reduce the size of the SNN on each edge device. The SNNs perform inference in parallel, and are configured to generate a ‘Don't Know’ response when an unassigned class is identified. Our experiments show up to 17X inference speedup compared to a recent work, on devices of at most 150 MB memory when distributing a variant of VGG-16 over 20 parallel edge devices.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"$\\\\text{Edge}^{n}$ AI: Distributed Inference with Local Edge Devices and Minimal Latency\",\"authors\":\"Maedeh Hemmat, A. Davoodi, Y. Hu\",\"doi\":\"10.1109/ASP-DAC52403.2022.9712496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose $\\\\text{Edge}^{n}$ AI, a framework to decompose a complex deep neural networks (DNN) over $n$ available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is applied to aggressively reduce the size of the SNN on each edge device. The SNNs perform inference in parallel, and are configured to generate a ‘Don't Know’ response when an unassigned class is identified. Our experiments show up to 17X inference speedup compared to a recent work, on devices of at most 150 MB memory when distributing a variant of VGG-16 over 20 parallel edge devices.\",\"PeriodicalId\":239260,\"journal\":{\"name\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC52403.2022.9712496\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC52403.2022.9712496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
$\text{Edge}^{n}$ AI: Distributed Inference with Local Edge Devices and Minimal Latency
We propose $\text{Edge}^{n}$ AI, a framework to decompose a complex deep neural networks (DNN) over $n$ available local edge devices with minimal communication overhead and overall latency. Our framework creates small DNNs (SNNs) from an original DNN by partitioning its classes across the edge devices, while taking into account their available resources. Class-aware pruning is applied to aggressively reduce the size of the SNN on each edge device. The SNNs perform inference in parallel, and are configured to generate a ‘Don't Know’ response when an unassigned class is identified. Our experiments show up to 17X inference speedup compared to a recent work, on devices of at most 150 MB memory when distributing a variant of VGG-16 over 20 parallel edge devices.