高性能异步CNN加速器与早期终止

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI:10.1109/MCSoC57363.2022.00031

Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey

{"title":"高性能异步CNN加速器与早期终止","authors":"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00031","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Performance Asynchronous CNN Accelerator with Early Termination\",\"authors\":\"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey\",\"doi\":\"10.1109/MCSoC57363.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.\",\"PeriodicalId\":150801,\"journal\":{\"name\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC57363.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络(Convolutional Neural Network, CNN)，尤其是非常深度的网络，计算量大，导致时延长，功耗高。在现实世界的推理中，动态变化的环境条件会导致高度复杂的问题，因此需要这些低效的深度网络来始终保证令人满意的准确性。一些研究采用近似技术来执行网络的部分计算，试图减少不必要的计算量。然而，这些方法在本质上仍然是高度顺序的，因为它们仍然需要运行整个网络。本文在已经训练好的CNN上提出了一种早期终止架构，允许在网络中途测试部分结果，通过在主网络足够作为推理结果时终止主网络来减少计算量。第一种方案是在同步电路中实现的，然而，由于其性质，即使没有新数据产生，也需要捕获所有存储元件。第二种方案采用异步电路，大大降低了功耗，并进一步加快了体系结构，因为操作不需要等待电路中最慢的关键路径。该电路在FPGA平台上进行了设计。异步电路的结果显示，与同步电路相比，速度提高了近20%，功耗降低了约12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-Performance Asynchronous CNN Accelerator with Early Termination

Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量