{"title":"通过强化学习自动生成和优化基于 NoC 的神经网络加速器框架","authors":"Yongqi Xue;Jinlun Ji;Xinming Yu;Shize Zhou;Siyue Li;Xinyi Li;Tong Cheng;Shiping Li;Kai Chen;Zhonghai Lu;Li Li;Yuxiang Fu","doi":"10.1109/TC.2024.3441822","DOIUrl":null,"url":null,"abstract":"Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n, improves the throughput by 11.27\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n, reduces the energy by 12.62\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n, and reduces the time-energy-product (TEP) by 14.49\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2882-2896"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning\",\"authors\":\"Yongqi Xue;Jinlun Ji;Xinming Yu;Shize Zhou;Siyue Li;Xinyi Li;Tong Cheng;Shiping Li;Kai Chen;Zhonghai Lu;Li Li;Yuxiang Fu\",\"doi\":\"10.1109/TC.2024.3441822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09\\n<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula>\\n, improves the throughput by 11.27\\n<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula>\\n, reduces the energy by 12.62\\n<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula>\\n, and reduces the time-energy-product (TEP) by 14.49\\n<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula>\\n. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"73 12\",\"pages\":\"2882-2896\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10633899/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10633899/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning
Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09
$\%$
, improves the throughput by 11.27
$\%$
, reduces the energy by 12.62
$\%$
, and reduces the time-energy-product (TEP) by 14.49
$\%$
. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.