{"title":"Enable Pipeline Processing of DNN Co-inference Tasks In the Mobile-Edge Cloud","authors":"Sheng Hu, Chongwu Dong, Wushao Wen","doi":"10.1109/ICCCS52626.2021.9449178","DOIUrl":null,"url":null,"abstract":"Deep Neural Network (DNN) based artificial intelligence help driving the great development of mobile Internet. However, the hardware of a mobile device may not be sufficiently to meet the computational requirements of a DNN inference task. Fortunately, computation offloading to the network edge can mitigate part of computation pressure for mobile devices. In this case, DNN computation in mobile devices can be accelerated by an edge-assistance collaborative inference scheme. Since co-inference tasks with multiple processing stages may continuously arrive at mobile devices, only considering one DNN-based task for acceleration is not practical. To solve the above challenge effectively, we formulate the problem of multiple co-inference tasks acceleration as a pipeline execution model. Based on the model, we design a fine-grained optimizer, which integrates model partition, model early-exit and intermediate data compression, to achieve tradeoff between accuracy and latency. Considering computational characteristics of a pipeline, the goal of the optimizer is designed to ensure the pipeline system's inference rate and single task execution performance. We implement the system prototype and do benchmark tests under a real-life testbed and the results prove the effectiveness of the optimizer.","PeriodicalId":376290,"journal":{"name":"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS52626.2021.9449178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Deep Neural Network (DNN) based artificial intelligence help driving the great development of mobile Internet. However, the hardware of a mobile device may not be sufficiently to meet the computational requirements of a DNN inference task. Fortunately, computation offloading to the network edge can mitigate part of computation pressure for mobile devices. In this case, DNN computation in mobile devices can be accelerated by an edge-assistance collaborative inference scheme. Since co-inference tasks with multiple processing stages may continuously arrive at mobile devices, only considering one DNN-based task for acceleration is not practical. To solve the above challenge effectively, we formulate the problem of multiple co-inference tasks acceleration as a pipeline execution model. Based on the model, we design a fine-grained optimizer, which integrates model partition, model early-exit and intermediate data compression, to achieve tradeoff between accuracy and latency. Considering computational characteristics of a pipeline, the goal of the optimizer is designed to ensure the pipeline system's inference rate and single task execution performance. We implement the system prototype and do benchmark tests under a real-life testbed and the results prove the effectiveness of the optimizer.