{"title":"A 28nm, 4.69TOPS/W Training, 2.34µJ/lmage Inference, on-chip Training Accelerator with Inference-compatible Back Propagation","authors":"Haitao Ge, Weiwei Shan, Yicheng Lu, Jun Yang","doi":"10.1109/ICTA56932.2022.9963098","DOIUrl":null,"url":null,"abstract":"Previous on-chip training accelerators improved training efficiency but seldomly considered inference efficiency. We propose to convert back propagation to be compatible with inference, use interleaved memory allocation to reduce external memory access and zero-skipping loss propagation. Working at 40MHz, 0.48V core voltage, our 28nm one-core OCT chip has peak training efficiency of 4.69TOPS/W and the best inference energy of 2.34 µJ/inf/ image, 9.1× better than SoTA work.","PeriodicalId":325602,"journal":{"name":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA56932.2022.9963098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Previous on-chip training accelerators improved training efficiency but seldomly considered inference efficiency. We propose to convert back propagation to be compatible with inference, use interleaved memory allocation to reduce external memory access and zero-skipping loss propagation. Working at 40MHz, 0.48V core voltage, our 28nm one-core OCT chip has peak training efficiency of 4.69TOPS/W and the best inference energy of 2.34 µJ/inf/ image, 9.1× better than SoTA work.