Vladislav Zholondkovskiy, S. Landyshev, Y. Bocharov, V. Butuzov
{"title":"基于神经矩阵和RISC内核的lstm型神经网络在处理器上的实现","authors":"Vladislav Zholondkovskiy, S. Landyshev, Y. Bocharov, V. Butuzov","doi":"10.1109/DSPA48919.2020.9213291","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the implementation of a recurrent artificial neural network LSTM on the NM6407 digital signal processor (DSP) that is optimized for performing vector and matrix calculations. It contains two NeuroMatrix NMC4 cores, each of which includes RISC processor and vector coprocessor. The architectural features and processor resources are considered, as well as an assessment of its performance in the implementation of the LSTM network to solve a typical classification problem. The implementation of this type of network on the NM6407 tensor core accelerated computations by a factor of 15–350 compared to a scalar processor.","PeriodicalId":262164,"journal":{"name":"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)","volume":"16 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LSTM-type Neural Network Implementation on a Processor Based on Neuromatrix and RISC Cores for Resource-Limited Applications\",\"authors\":\"Vladislav Zholondkovskiy, S. Landyshev, Y. Bocharov, V. Butuzov\",\"doi\":\"10.1109/DSPA48919.2020.9213291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we consider the implementation of a recurrent artificial neural network LSTM on the NM6407 digital signal processor (DSP) that is optimized for performing vector and matrix calculations. It contains two NeuroMatrix NMC4 cores, each of which includes RISC processor and vector coprocessor. The architectural features and processor resources are considered, as well as an assessment of its performance in the implementation of the LSTM network to solve a typical classification problem. The implementation of this type of network on the NM6407 tensor core accelerated computations by a factor of 15–350 compared to a scalar processor.\",\"PeriodicalId\":262164,\"journal\":{\"name\":\"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)\",\"volume\":\"16 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSPA48919.2020.9213291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22th International Conference on Digital Signal Processing and its Applications (DSPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSPA48919.2020.9213291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LSTM-type Neural Network Implementation on a Processor Based on Neuromatrix and RISC Cores for Resource-Limited Applications
In this paper, we consider the implementation of a recurrent artificial neural network LSTM on the NM6407 digital signal processor (DSP) that is optimized for performing vector and matrix calculations. It contains two NeuroMatrix NMC4 cores, each of which includes RISC processor and vector coprocessor. The architectural features and processor resources are considered, as well as an assessment of its performance in the implementation of the LSTM network to solve a typical classification problem. The implementation of this type of network on the NM6407 tensor core accelerated computations by a factor of 15–350 compared to a scalar processor.