Chenchen Ji, Yanjun Wu, Pengpeng Hou, Yang Tai, Jiageng Yu
{"title":"基于边缘和云之间图像流管道推理的自适应DNN划分新方法","authors":"Chenchen Ji, Yanjun Wu, Pengpeng Hou, Yang Tai, Jiageng Yu","doi":"10.1109/cniot55862.2022.00021","DOIUrl":null,"url":null,"abstract":"The cloud-only and edge-computing approaches have recently been proposed to satisfy the requirements of complex neural networks. However, the cloud-only approach generates a latency challenge because of the high data volumes that must be sent to a centralized location in the cloud. Less-powerful edge computing resources require a compression model for computation reduction, which degrades the model trading accuracy. To address this challenge, deep neural network (DNN) partitioning has become a recent trend, with DNN models being sliced into head and tail portions executed at the mobile edge devices and cloud server, respectively. We propose Edgepipe, a novel partitioning method based on pipeline inference with an image stream to automatically partition DNN computation between the edge device and cloud server, thereby reducing the global latency and enhancing the system-wide real-time performance. This method adapts to various DNN architectures, hardware platforms, and networks. Here, when evaluated on a suite of five DNN applications, Edgepipe achieves average latency speedups of 1.241× and 1.154× over the cloud-only approach and the state-of-the-art approach known as “Neurosurgeon”, respectively.","PeriodicalId":251734,"journal":{"name":"2022 3rd International Conference on Computing, Networks and Internet of Things (CNIOT)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel Adaptive DNN Partitioning Method Based on Image-Stream Pipeline Inference between the Edge and Cloud\",\"authors\":\"Chenchen Ji, Yanjun Wu, Pengpeng Hou, Yang Tai, Jiageng Yu\",\"doi\":\"10.1109/cniot55862.2022.00021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The cloud-only and edge-computing approaches have recently been proposed to satisfy the requirements of complex neural networks. However, the cloud-only approach generates a latency challenge because of the high data volumes that must be sent to a centralized location in the cloud. Less-powerful edge computing resources require a compression model for computation reduction, which degrades the model trading accuracy. To address this challenge, deep neural network (DNN) partitioning has become a recent trend, with DNN models being sliced into head and tail portions executed at the mobile edge devices and cloud server, respectively. We propose Edgepipe, a novel partitioning method based on pipeline inference with an image stream to automatically partition DNN computation between the edge device and cloud server, thereby reducing the global latency and enhancing the system-wide real-time performance. This method adapts to various DNN architectures, hardware platforms, and networks. Here, when evaluated on a suite of five DNN applications, Edgepipe achieves average latency speedups of 1.241× and 1.154× over the cloud-only approach and the state-of-the-art approach known as “Neurosurgeon”, respectively.\",\"PeriodicalId\":251734,\"journal\":{\"name\":\"2022 3rd International Conference on Computing, Networks and Internet of Things (CNIOT)\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd International Conference on Computing, Networks and Internet of Things (CNIOT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cniot55862.2022.00021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd International Conference on Computing, Networks and Internet of Things (CNIOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cniot55862.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novel Adaptive DNN Partitioning Method Based on Image-Stream Pipeline Inference between the Edge and Cloud
The cloud-only and edge-computing approaches have recently been proposed to satisfy the requirements of complex neural networks. However, the cloud-only approach generates a latency challenge because of the high data volumes that must be sent to a centralized location in the cloud. Less-powerful edge computing resources require a compression model for computation reduction, which degrades the model trading accuracy. To address this challenge, deep neural network (DNN) partitioning has become a recent trend, with DNN models being sliced into head and tail portions executed at the mobile edge devices and cloud server, respectively. We propose Edgepipe, a novel partitioning method based on pipeline inference with an image stream to automatically partition DNN computation between the edge device and cloud server, thereby reducing the global latency and enhancing the system-wide real-time performance. This method adapts to various DNN architectures, hardware platforms, and networks. Here, when evaluated on a suite of five DNN applications, Edgepipe achieves average latency speedups of 1.241× and 1.154× over the cloud-only approach and the state-of-the-art approach known as “Neurosurgeon”, respectively.