M. Foltin, G. Aguiar, Rodrigo Antunes, P. Silveira, Gustavo Knuppe, J. Ambrosi, Soumitra Chatterjee, J. Kolhe, Sunil Lakshiminarashimha, D. Milojicic, J. Strachan, C. Warner, Amit Sharma, Eddie Lee, S. R. Chalamalasetti, C. Brueggen, Charles Williams, Nathaniel Jansen, Felipe Saenz, Luis Federico Li
{"title":"基于忆阻器的可编程超高效机器学习推理加速器的FPGA演示","authors":"M. Foltin, G. Aguiar, Rodrigo Antunes, P. Silveira, Gustavo Knuppe, J. Ambrosi, Soumitra Chatterjee, J. Kolhe, Sunil Lakshiminarashimha, D. Milojicic, J. Strachan, C. Warner, Amit Sharma, Eddie Lee, S. R. Chalamalasetti, C. Brueggen, Charles Williams, Nathaniel Jansen, Felipe Saenz, Luis Federico Li","doi":"10.1109/ICRC.2019.8914705","DOIUrl":null,"url":null,"abstract":"Hybrid analog-digital neuromorphic accelerators show promise for significant increase in performance per watt of deep learning inference and training as compared with conventional technologies. In this work we present an FPGA demonstrator of a programmable hybrid inferencing accelerator, with memristor analog dot product engines emulated by digital matrix-vector multiplication units employing FPGA SRAM memory for in-situ weight storage. The full-chip demonstrator interfaced to a host by PCIe interface serves as a software development platform and a vehicle for further hardware microarchitecture improvements. Implementation of compute cores, tiles, network on a chip, and the host interface is discussed. New pipelining scheme is introduced to achieve high utilization of matrix-vector multiplication units while reducing tile data memory size requirements for neural network layer activations. The data flow orchestration between the tiles is described, controlled by a RISC-V core. Inferencing accuracy analysis is presented for an example RNN and CNN models. The demonstrator is instrumented with hardware monitors to enable performance measurements and tuning. Performance projections for future memristor-based ASIC are also discussed.","PeriodicalId":297574,"journal":{"name":"2019 IEEE International Conference on Rebooting Computing (ICRC)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FPGA Demonstrator of a Programmable Ultra-Efficient Memristor-Based Machine Learning Inference Accelerator\",\"authors\":\"M. Foltin, G. Aguiar, Rodrigo Antunes, P. Silveira, Gustavo Knuppe, J. Ambrosi, Soumitra Chatterjee, J. Kolhe, Sunil Lakshiminarashimha, D. Milojicic, J. Strachan, C. Warner, Amit Sharma, Eddie Lee, S. R. Chalamalasetti, C. Brueggen, Charles Williams, Nathaniel Jansen, Felipe Saenz, Luis Federico Li\",\"doi\":\"10.1109/ICRC.2019.8914705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hybrid analog-digital neuromorphic accelerators show promise for significant increase in performance per watt of deep learning inference and training as compared with conventional technologies. In this work we present an FPGA demonstrator of a programmable hybrid inferencing accelerator, with memristor analog dot product engines emulated by digital matrix-vector multiplication units employing FPGA SRAM memory for in-situ weight storage. The full-chip demonstrator interfaced to a host by PCIe interface serves as a software development platform and a vehicle for further hardware microarchitecture improvements. Implementation of compute cores, tiles, network on a chip, and the host interface is discussed. New pipelining scheme is introduced to achieve high utilization of matrix-vector multiplication units while reducing tile data memory size requirements for neural network layer activations. The data flow orchestration between the tiles is described, controlled by a RISC-V core. Inferencing accuracy analysis is presented for an example RNN and CNN models. The demonstrator is instrumented with hardware monitors to enable performance measurements and tuning. Performance projections for future memristor-based ASIC are also discussed.\",\"PeriodicalId\":297574,\"journal\":{\"name\":\"2019 IEEE International Conference on Rebooting Computing (ICRC)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Rebooting Computing (ICRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRC.2019.8914705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Rebooting Computing (ICRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRC.2019.8914705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA Demonstrator of a Programmable Ultra-Efficient Memristor-Based Machine Learning Inference Accelerator
Hybrid analog-digital neuromorphic accelerators show promise for significant increase in performance per watt of deep learning inference and training as compared with conventional technologies. In this work we present an FPGA demonstrator of a programmable hybrid inferencing accelerator, with memristor analog dot product engines emulated by digital matrix-vector multiplication units employing FPGA SRAM memory for in-situ weight storage. The full-chip demonstrator interfaced to a host by PCIe interface serves as a software development platform and a vehicle for further hardware microarchitecture improvements. Implementation of compute cores, tiles, network on a chip, and the host interface is discussed. New pipelining scheme is introduced to achieve high utilization of matrix-vector multiplication units while reducing tile data memory size requirements for neural network layer activations. The data flow orchestration between the tiles is described, controlled by a RISC-V core. Inferencing accuracy analysis is presented for an example RNN and CNN models. The demonstrator is instrumented with hardware monitors to enable performance measurements and tuning. Performance projections for future memristor-based ASIC are also discussed.