Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey
{"title":"高性能异步CNN加速器与早期终止","authors":"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00031","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Performance Asynchronous CNN Accelerator with Early Termination\",\"authors\":\"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey\",\"doi\":\"10.1109/MCSoC57363.2022.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.\",\"PeriodicalId\":150801,\"journal\":{\"name\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC57363.2022.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-Performance Asynchronous CNN Accelerator with Early Termination
Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.