{"title":"NASH:乘法还原混合模型的神经架构和加速器搜索","authors":"Yang Xu;Huihong Shi;Zhongfeng Wang","doi":"10.1109/TCSI.2024.3457628","DOIUrl":null,"url":null,"abstract":"The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multiplication-reduced hybrid models have emerged to combine the benefits of both approaches. Particularly, prior works, i.e., NASA and NASA-F, leverage Neural Architecture Search (NAS) to construct such hybrid models, enhancing hardware efficiency while maintaining accuracy. However, they either entail costly retraining or encounter gradient conflicts, limiting both search efficiency and accuracy. Additionally, they overlook the acceleration opportunity introduced by accelerator search, yielding sub-optimal hardware performance. To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models. Specifically, as for NAS, we propose a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts. Regarding accelerator search, we innovatively introduce coarse-to-fine search to streamline the search process. Furthermore, we seamlessly integrate these two levels of searches to unveil NASH, obtaining optimal model and accelerator pairing. Experiments validate our effectiveness, e.g., when compared with the state-of-the-art multiplication-based system, we can achieve \n<inline-formula> <tex-math>$\\uparrow 2.14\\times $ </tex-math></inline-formula>\n throughput and \n<inline-formula> <tex-math>$\\uparrow 2.01\\times $ </tex-math></inline-formula>\n FPS with \n<inline-formula> <tex-math>$\\uparrow 0.25\\%$ </tex-math></inline-formula>\n accuracy on CIFAR-100, and \n<inline-formula> <tex-math>$\\uparrow 1.40\\times $ </tex-math></inline-formula>\n throughput and \n<inline-formula> <tex-math>$\\uparrow 1.19\\times $ </tex-math></inline-formula>\n FPS with \n<inline-formula> <tex-math>$\\uparrow 0.56\\%$ </tex-math></inline-formula>\n accuracy on Tiny-ImageNet. Codes are available at \n<uri>https://github.com/xuyang527/NASH</uri>\n.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"71 12","pages":"5956-5968"},"PeriodicalIF":5.2000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models\",\"authors\":\"Yang Xu;Huihong Shi;Zhongfeng Wang\",\"doi\":\"10.1109/TCSI.2024.3457628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multiplication-reduced hybrid models have emerged to combine the benefits of both approaches. Particularly, prior works, i.e., NASA and NASA-F, leverage Neural Architecture Search (NAS) to construct such hybrid models, enhancing hardware efficiency while maintaining accuracy. However, they either entail costly retraining or encounter gradient conflicts, limiting both search efficiency and accuracy. Additionally, they overlook the acceleration opportunity introduced by accelerator search, yielding sub-optimal hardware performance. To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models. Specifically, as for NAS, we propose a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts. Regarding accelerator search, we innovatively introduce coarse-to-fine search to streamline the search process. Furthermore, we seamlessly integrate these two levels of searches to unveil NASH, obtaining optimal model and accelerator pairing. Experiments validate our effectiveness, e.g., when compared with the state-of-the-art multiplication-based system, we can achieve \\n<inline-formula> <tex-math>$\\\\uparrow 2.14\\\\times $ </tex-math></inline-formula>\\n throughput and \\n<inline-formula> <tex-math>$\\\\uparrow 2.01\\\\times $ </tex-math></inline-formula>\\n FPS with \\n<inline-formula> <tex-math>$\\\\uparrow 0.25\\\\%$ </tex-math></inline-formula>\\n accuracy on CIFAR-100, and \\n<inline-formula> <tex-math>$\\\\uparrow 1.40\\\\times $ </tex-math></inline-formula>\\n throughput and \\n<inline-formula> <tex-math>$\\\\uparrow 1.19\\\\times $ </tex-math></inline-formula>\\n FPS with \\n<inline-formula> <tex-math>$\\\\uparrow 0.56\\\\%$ </tex-math></inline-formula>\\n accuracy on Tiny-ImageNet. Codes are available at \\n<uri>https://github.com/xuyang527/NASH</uri>\\n.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"71 12\",\"pages\":\"5956-5968\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10681223/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681223/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
乘法运算的巨大计算成本阻碍了深度神经网络(DNN)在边缘设备上的部署。虽然无乘法模型提高了硬件效率,但通常会牺牲精度。作为一种解决方案,乘法还原混合模型应运而生,它结合了两种方法的优点。特别是之前的工作,即 NASA 和 NASA-F,利用神经架构搜索(NAS)来构建这种混合模型,在保持准确性的同时提高了硬件效率。然而,它们要么需要昂贵的重新训练,要么会遇到梯度冲突,从而限制了搜索效率和准确性。此外,它们还忽视了加速器搜索带来的加速机会,导致硬件性能达不到最优。为了克服这些局限性,我们提出了 NASH,一个用于乘法还原混合模型的神经架构和加速器搜索框架。具体来说,与 NAS 一样,我们提出了一种量身定制的 "0-shot "度量方法,用于在训练前预先识别有前途的混合模型,从而在提高搜索效率的同时缓解梯度冲突。在加速器搜索方面,我们创新性地引入了从粗到细的搜索,以简化搜索过程。此外,我们还无缝整合了这两个层次的搜索,以揭开 NASH 的面纱,获得最佳模型和加速器配对。实验验证了我们的有效性,例如,与最先进的基于乘法的系统相比,我们可以在CIFAR-100上实现$\uparrow 2.14倍的吞吐量和$\uparrow 2.01倍的FPS,精度为$\uparrow 0.25\%$;在Tiny-ImageNet上实现$\uparrow 1.40倍的吞吐量和$\uparrow 1.19倍的FPS,精度为$\uparrow 0.56\%$。代码见 https://github.com/xuyang527/NASH。
NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models
The significant computational cost of multiplications hinders the deployment of deep neural networks (DNNs) on edge devices. While multiplication-free models offer enhanced hardware efficiency, they typically sacrifice accuracy. As a solution, multiplication-reduced hybrid models have emerged to combine the benefits of both approaches. Particularly, prior works, i.e., NASA and NASA-F, leverage Neural Architecture Search (NAS) to construct such hybrid models, enhancing hardware efficiency while maintaining accuracy. However, they either entail costly retraining or encounter gradient conflicts, limiting both search efficiency and accuracy. Additionally, they overlook the acceleration opportunity introduced by accelerator search, yielding sub-optimal hardware performance. To overcome these limitations, we propose NASH, a Neural architecture and Accelerator Search framework for multiplication-reduced Hybrid models. Specifically, as for NAS, we propose a tailored zero-shot metric to pre-identify promising hybrid models before training, enhancing search efficiency while alleviating gradient conflicts. Regarding accelerator search, we innovatively introduce coarse-to-fine search to streamline the search process. Furthermore, we seamlessly integrate these two levels of searches to unveil NASH, obtaining optimal model and accelerator pairing. Experiments validate our effectiveness, e.g., when compared with the state-of-the-art multiplication-based system, we can achieve
$\uparrow 2.14\times $
throughput and
$\uparrow 2.01\times $
FPS with
$\uparrow 0.25\%$
accuracy on CIFAR-100, and
$\uparrow 1.40\times $
throughput and
$\uparrow 1.19\times $
FPS with
$\uparrow 0.56\%$
accuracy on Tiny-ImageNet. Codes are available at
https://github.com/xuyang527/NASH
.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.