CoIn:通过异构设备上的数据分区加速CNN协同推理

2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) Pub Date : 2020-03-01 DOI:10.1109/ICACCS48705.2020.9074444

V. K, Anu George, Srivatsav Gunisetty, S. Subramanian, Shravan Kashyap R, M. Purnaprajna

{"title":"CoIn:通过异构设备上的数据分区加速CNN协同推理","authors":"V. K, Anu George, Srivatsav Gunisetty, S. Subramanian, Shravan Kashyap R, M. Purnaprajna","doi":"10.1109/ICACCS48705.2020.9074444","DOIUrl":null,"url":null,"abstract":"In Convolutional Neural Networks (CNN), the need for low inference time per batch is crucial for real-time applications. To improve the inference time, we present a method (CoIn) that benefits from the use of multiple devices that execute simultaneously. Our method achieves the goal of low inference time by partitioning images of a batch on diverse micro-architectures. The strategy for partitioning is based on offline profiling on the target devices. We have validated our partitioning technique on CPUs, GPUs and FPGAs that include memory-constrained devices in which case, a re-partitioning technique is applied. An average speedup of 1.39x and 1.5x is seen with CPU-GPU and CPU-GPU-FPGA co-execution respectively. In comparison with the approach of the state-of-the-art, CoIn has an average speedup of 1.62x across all networks.","PeriodicalId":439003,"journal":{"name":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"CoIn: Accelerated CNN Co-Inference through data partitioning on heterogeneous devices\",\"authors\":\"V. K, Anu George, Srivatsav Gunisetty, S. Subramanian, Shravan Kashyap R, M. Purnaprajna\",\"doi\":\"10.1109/ICACCS48705.2020.9074444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Convolutional Neural Networks (CNN), the need for low inference time per batch is crucial for real-time applications. To improve the inference time, we present a method (CoIn) that benefits from the use of multiple devices that execute simultaneously. Our method achieves the goal of low inference time by partitioning images of a batch on diverse micro-architectures. The strategy for partitioning is based on offline profiling on the target devices. We have validated our partitioning technique on CPUs, GPUs and FPGAs that include memory-constrained devices in which case, a re-partitioning technique is applied. An average speedup of 1.39x and 1.5x is seen with CPU-GPU and CPU-GPU-FPGA co-execution respectively. In comparison with the approach of the state-of-the-art, CoIn has an average speedup of 1.62x across all networks.\",\"PeriodicalId\":439003,\"journal\":{\"name\":\"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACCS48705.2020.9074444\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS48705.2020.9074444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在卷积神经网络(CNN)中，每批低推理时间的需求对实时应用至关重要。为了提高推理时间，我们提出了一种方法(CoIn)，该方法受益于同时执行多个设备的使用。我们的方法通过在不同的微架构上对一批图像进行分区，达到了低推理时间的目的。分区策略基于目标设备上的脱机分析。我们已经在包含内存受限设备的cpu、gpu和fpga上验证了我们的分区技术，在这种情况下，应用了重新分区技术。CPU-GPU协同执行和CPU-GPU- fpga协同执行的平均加速分别为1.39倍和1.5倍。与最先进的方法相比，CoIn在所有网络中的平均加速速度为1.62倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CoIn: Accelerated CNN Co-Inference through data partitioning on heterogeneous devices

In Convolutional Neural Networks (CNN), the need for low inference time per batch is crucial for real-time applications. To improve the inference time, we present a method (CoIn) that benefits from the use of multiple devices that execute simultaneously. Our method achieves the goal of low inference time by partitioning images of a batch on diverse micro-architectures. The strategy for partitioning is based on offline profiling on the target devices. We have validated our partitioning technique on CPUs, GPUs and FPGAs that include memory-constrained devices in which case, a re-partitioning technique is applied. An average speedup of 1.39x and 1.5x is seen with CPU-GPU and CPU-GPU-FPGA co-execution respectively. In comparison with the approach of the state-of-the-art, CoIn has an average speedup of 1.62x across all networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)

自引率

0.00%

发文量