Shuangkang Fang , Weixin Xu , Zipeng Feng , Song Yuan , Yufeng Wang , Yi Yang , Wenrui Ding , Shuchang Zhou
{"title":"Arch-Net: Model conversion and quantization for architecture agnostic model deployment","authors":"Shuangkang Fang , Weixin Xu , Zipeng Feng , Song Yuan , Yufeng Wang , Yi Yang , Wenrui Ding , Shuchang Zhou","doi":"10.1016/j.neunet.2025.107384","DOIUrl":null,"url":null,"abstract":"<div><div>The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107384"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002631","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.