Ruoyang Liu;Wenxun Wang;Chen Tang;Weichen Gao;Huazhong Yang;Yongpan Liu
{"title":"A Fully Quantized Training Accelerator for Diffusion Network With Tensor Type & Noise Strength Aware Precision Scheduling","authors":"Ruoyang Liu;Wenxun Wang;Chen Tang;Weichen Gao;Huazhong Yang;Yongpan Liu","doi":"10.1109/TCSII.2024.3439319","DOIUrl":null,"url":null,"abstract":"Fine-grained mixed-precision fully-quantized methods have great potential to accelerate neural network training, but existing methods exhibit large accuracy loss for more complex models such as diffusion networks. This brief introduces a fully-quantized training accelerator for diffusion networks. It features a novel training framework with tensor-type- and noise-strength-aware precision scheduling to optimize bit-width allocation. The processing cluster design enables dynamical switching bit-width mappings for model weights, allows concurrent processing in 4 different bit-widths, and incorporates a gradient square sum collection unit to minimize on-chip memory access. Experimental results show up to 2.4\n<inline-formula> <tex-math>$\\times $ </tex-math></inline-formula>\n training speedup and 81% operation bit-width overhead reduction compared to existing designs, with minimal impact on image generation quality.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"71 12","pages":"4994-4998"},"PeriodicalIF":4.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10623715/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained mixed-precision fully-quantized methods have great potential to accelerate neural network training, but existing methods exhibit large accuracy loss for more complex models such as diffusion networks. This brief introduces a fully-quantized training accelerator for diffusion networks. It features a novel training framework with tensor-type- and noise-strength-aware precision scheduling to optimize bit-width allocation. The processing cluster design enables dynamical switching bit-width mappings for model weights, allows concurrent processing in 4 different bit-widths, and incorporates a gradient square sum collection unit to minimize on-chip memory access. Experimental results show up to 2.4
$\times $
training speedup and 81% operation bit-width overhead reduction compared to existing designs, with minimal impact on image generation quality.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.