Vladislav Zholondkovskiy, S. Landyshev, Y. Bocharov, V. Butuzov
{"title":"LSTM-type Neural Network Implementation on a Processor Based on Neuromatrix and RISC Cores for Resource-Limited Applications","authors":"Vladislav Zholondkovskiy, S. Landyshev, Y. Bocharov, V. Butuzov","doi":"10.1109/DSPA48919.2020.9213291","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the implementation of a recurrent artificial neural network LSTM on the NM6407 digital signal processor (DSP) that is optimized for performing vector and matrix calculations. It contains two NeuroMatrix NMC4 cores, each of which includes RISC processor and vector coprocessor. The architectural features and processor resources are considered, as well as an assessment of its performance in the implementation of the LSTM network to solve a typical classification problem. The implementation of this type of network on the NM6407 tensor core accelerated computations by a factor of 15–350 compared to a scalar processor.","PeriodicalId":262164,"journal":{"name":"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)","volume":"16 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSPA48919.2020.9213291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we consider the implementation of a recurrent artificial neural network LSTM on the NM6407 digital signal processor (DSP) that is optimized for performing vector and matrix calculations. It contains two NeuroMatrix NMC4 cores, each of which includes RISC processor and vector coprocessor. The architectural features and processor resources are considered, as well as an assessment of its performance in the implementation of the LSTM network to solve a typical classification problem. The implementation of this type of network on the NM6407 tensor core accelerated computations by a factor of 15–350 compared to a scalar processor.