Oumayma Bel Haj Salah , Seifeddine Messaoud , Mohamed Ali Hajjaji , Mohamed Atri , Noureddine Liouane
{"title":"Post-training quantization for efficient FPGA-based neural network acceleration","authors":"Oumayma Bel Haj Salah , Seifeddine Messaoud , Mohamed Ali Hajjaji , Mohamed Atri , Noureddine Liouane","doi":"10.1016/j.vlsi.2025.102508","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread success of Convolutional Neural Networks (CNNs) in computer vision has been accompanied by soaring computational demands, often requiring high-performance GPUs for real-time inference. However, such hardware is impractical in embedded and resource-constrained environment. To address this, we propose a post-training quantization (PTQ) framework that converts CNN models from FP32 to INT8 without retraining, optimized for FPGA deployment. Using asymmetric quantization and TensorFlow Lite, we implemented VGG16 and ResNet50 on a PYNQ-Z1 Field-Programmable Gate Arrays (FPGA). The quantized VGG16 achieved a 67% increase in throughput (from 150 FPS to 250 FPS), a 68% reduction in latency, and a 52% improvement in Power-Delay Product. ResNet50 saw over 420% gain in DSP efficiency, a 3100% increase in LUT efficiency, and a 94% PDP reduction. Despite a marginal accuracy loss, both models showed significantly improved energy efficiency and performance-per-resource utilization. Our results confirm that PTQ enables scalable, low-power AI inference suitable for real-time applications on edge and embedded systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"105 ","pages":"Article 102508"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926025001658","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The widespread success of Convolutional Neural Networks (CNNs) in computer vision has been accompanied by soaring computational demands, often requiring high-performance GPUs for real-time inference. However, such hardware is impractical in embedded and resource-constrained environment. To address this, we propose a post-training quantization (PTQ) framework that converts CNN models from FP32 to INT8 without retraining, optimized for FPGA deployment. Using asymmetric quantization and TensorFlow Lite, we implemented VGG16 and ResNet50 on a PYNQ-Z1 Field-Programmable Gate Arrays (FPGA). The quantized VGG16 achieved a 67% increase in throughput (from 150 FPS to 250 FPS), a 68% reduction in latency, and a 52% improvement in Power-Delay Product. ResNet50 saw over 420% gain in DSP efficiency, a 3100% increase in LUT efficiency, and a 94% PDP reduction. Despite a marginal accuracy loss, both models showed significantly improved energy efficiency and performance-per-resource utilization. Our results confirm that PTQ enables scalable, low-power AI inference suitable for real-time applications on edge and embedded systems.
期刊介绍:
Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics:
Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.