Jordan Sturtz , Kushal Kalyan Devalampeta Surendranath , Maxwell Sam , Xingang Fu , Chanakya Dinesh Hingu , Rajab Challoo , Letu Qingge
{"title":"利用新型剔除技术加速太阳能逆变器神经网络控制器在 FPGA 上的嵌入式实现","authors":"Jordan Sturtz , Kushal Kalyan Devalampeta Surendranath , Maxwell Sam , Xingang Fu , Chanakya Dinesh Hingu , Rajab Challoo , Letu Qingge","doi":"10.1016/j.pmcj.2024.101975","DOIUrl":null,"url":null,"abstract":"<div><p>Accelerating neural network (NN) controllers is important for improving the performance, efficiency, scalability, and reliability of real-time systems, particularly in resource-constrained embedded systems. This paper introduces a novel weight-dropout method for training neural network controllers in real-time closed-loop systems, aimed at accelerating the embedded implementation for solar inverters. The core idea is to eliminate small-magnitude weights during training, thereby reducing the number of necessary connections while ensuring the network’s convergence. To maintain convergence, only non-diagonal elements of the weight matrices were dropped. This dropout technique was integrated into the Levenberg–Marquardt and Forward Accumulation Through Time algorithms, resulting in more efficient training for trajectory tracking. We executed the proposed training algorithm with dropout on the AWS cloud, observing a performance increase of approximately four times compared to local execution. Furthermore, implementing the neural network controller on the Intel Cyclone V Field Programmable Gate Array (FPGA) demonstrates significant improvements in computational and resource efficiency due to the proposed dropout technique leading to sparse weight matrices. This optimization enhances the suitability of the neural network controller for embedded environments. In comparison to Sturtz et al. (2023), which dropped 11 weights, our approach eliminated 18 weights, significantly boosting resource efficiency. This resulted in a 16.40% reduction in Adaptive Logic Modules (ALMs), decreasing the count to 47,426.5. Combinational Look-Up Tables (LUTs) and dedicated logic registers saw reductions of 17.80% and 15.55%, respectively. However, the impact on block memory bits is minimal, showing only a 1% improvement, indicating that memory resources are less affected by weight dropout. In contrast, the usage of Memory 10 Kilobits (MK10s) dropped from 97 to 87, marking a 10% improvement. We also propose an adaptive dropout technique to further improve the previous results.</p></div>","PeriodicalId":49005,"journal":{"name":"Pervasive and Mobile Computing","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating the neural network controller embedded implementation on FPGA with novel dropout techniques for a solar inverter\",\"authors\":\"Jordan Sturtz , Kushal Kalyan Devalampeta Surendranath , Maxwell Sam , Xingang Fu , Chanakya Dinesh Hingu , Rajab Challoo , Letu Qingge\",\"doi\":\"10.1016/j.pmcj.2024.101975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Accelerating neural network (NN) controllers is important for improving the performance, efficiency, scalability, and reliability of real-time systems, particularly in resource-constrained embedded systems. This paper introduces a novel weight-dropout method for training neural network controllers in real-time closed-loop systems, aimed at accelerating the embedded implementation for solar inverters. The core idea is to eliminate small-magnitude weights during training, thereby reducing the number of necessary connections while ensuring the network’s convergence. To maintain convergence, only non-diagonal elements of the weight matrices were dropped. This dropout technique was integrated into the Levenberg–Marquardt and Forward Accumulation Through Time algorithms, resulting in more efficient training for trajectory tracking. We executed the proposed training algorithm with dropout on the AWS cloud, observing a performance increase of approximately four times compared to local execution. Furthermore, implementing the neural network controller on the Intel Cyclone V Field Programmable Gate Array (FPGA) demonstrates significant improvements in computational and resource efficiency due to the proposed dropout technique leading to sparse weight matrices. This optimization enhances the suitability of the neural network controller for embedded environments. In comparison to Sturtz et al. (2023), which dropped 11 weights, our approach eliminated 18 weights, significantly boosting resource efficiency. This resulted in a 16.40% reduction in Adaptive Logic Modules (ALMs), decreasing the count to 47,426.5. Combinational Look-Up Tables (LUTs) and dedicated logic registers saw reductions of 17.80% and 15.55%, respectively. However, the impact on block memory bits is minimal, showing only a 1% improvement, indicating that memory resources are less affected by weight dropout. In contrast, the usage of Memory 10 Kilobits (MK10s) dropped from 97 to 87, marking a 10% improvement. We also propose an adaptive dropout technique to further improve the previous results.</p></div>\",\"PeriodicalId\":49005,\"journal\":{\"name\":\"Pervasive and Mobile Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pervasive and Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574119224001007\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pervasive and Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574119224001007","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Accelerating the neural network controller embedded implementation on FPGA with novel dropout techniques for a solar inverter
Accelerating neural network (NN) controllers is important for improving the performance, efficiency, scalability, and reliability of real-time systems, particularly in resource-constrained embedded systems. This paper introduces a novel weight-dropout method for training neural network controllers in real-time closed-loop systems, aimed at accelerating the embedded implementation for solar inverters. The core idea is to eliminate small-magnitude weights during training, thereby reducing the number of necessary connections while ensuring the network’s convergence. To maintain convergence, only non-diagonal elements of the weight matrices were dropped. This dropout technique was integrated into the Levenberg–Marquardt and Forward Accumulation Through Time algorithms, resulting in more efficient training for trajectory tracking. We executed the proposed training algorithm with dropout on the AWS cloud, observing a performance increase of approximately four times compared to local execution. Furthermore, implementing the neural network controller on the Intel Cyclone V Field Programmable Gate Array (FPGA) demonstrates significant improvements in computational and resource efficiency due to the proposed dropout technique leading to sparse weight matrices. This optimization enhances the suitability of the neural network controller for embedded environments. In comparison to Sturtz et al. (2023), which dropped 11 weights, our approach eliminated 18 weights, significantly boosting resource efficiency. This resulted in a 16.40% reduction in Adaptive Logic Modules (ALMs), decreasing the count to 47,426.5. Combinational Look-Up Tables (LUTs) and dedicated logic registers saw reductions of 17.80% and 15.55%, respectively. However, the impact on block memory bits is minimal, showing only a 1% improvement, indicating that memory resources are less affected by weight dropout. In contrast, the usage of Memory 10 Kilobits (MK10s) dropped from 97 to 87, marking a 10% improvement. We also propose an adaptive dropout technique to further improve the previous results.
期刊介绍:
As envisioned by Mark Weiser as early as 1991, pervasive computing systems and services have truly become integral parts of our daily lives. Tremendous developments in a multitude of technologies ranging from personalized and embedded smart devices (e.g., smartphones, sensors, wearables, IoTs, etc.) to ubiquitous connectivity, via a variety of wireless mobile communications and cognitive networking infrastructures, to advanced computing techniques (including edge, fog and cloud) and user-friendly middleware services and platforms have significantly contributed to the unprecedented advances in pervasive and mobile computing. Cutting-edge applications and paradigms have evolved, such as cyber-physical systems and smart environments (e.g., smart city, smart energy, smart transportation, smart healthcare, etc.) that also involve human in the loop through social interactions and participatory and/or mobile crowd sensing, for example. The goal of pervasive computing systems is to improve human experience and quality of life, without explicit awareness of the underlying communications and computing technologies.
The Pervasive and Mobile Computing Journal (PMC) is a high-impact, peer-reviewed technical journal that publishes high-quality scientific articles spanning theory and practice, and covering all aspects of pervasive and mobile computing and systems.