{"title":"A Quantization Model Based on a Floating-point Computing-in-Memory Architecture","authors":"X. Chen, An Guo, Xinbing Xu, Xin Si, Jun Yang","doi":"10.1109/APCCAS55924.2022.10090283","DOIUrl":null,"url":null,"abstract":"Computing-in-memory (CIM) has been proved to perform high energy efficiency and significant acceleration effect for high computational parallelism neural networks. Floating-point numbers and floating-point CIMs (FP-CIM) are required to execute high performance training and high accuracy inference for neural networks. However, none of former works discuss the relationship between circuit design based on the FP-CIM architecture and neural networks. In this paper, we propose a quantization model based on a FP-CIM architecture to figure out this relationship in PYTORCH. According to experimental results we summarize some principles on FP-CIM macro design. Using our quantization model can reduce data storage overhead by more than 70.0%, and control floating-point networks inference accuracy loss within 0.5%, which is 1.7% better than integer networks.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS55924.2022.10090283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Computing-in-memory (CIM) has been proved to perform high energy efficiency and significant acceleration effect for high computational parallelism neural networks. Floating-point numbers and floating-point CIMs (FP-CIM) are required to execute high performance training and high accuracy inference for neural networks. However, none of former works discuss the relationship between circuit design based on the FP-CIM architecture and neural networks. In this paper, we propose a quantization model based on a FP-CIM architecture to figure out this relationship in PYTORCH. According to experimental results we summarize some principles on FP-CIM macro design. Using our quantization model can reduce data storage overhead by more than 70.0%, and control floating-point networks inference accuracy loss within 0.5%, which is 1.7% better than integer networks.