{"title":"Efficient Storage Coding of Pruning MLP for Embedded System","authors":"Ming Xiaoman, Huang Letian","doi":"10.1109/ICET51757.2021.9451109","DOIUrl":null,"url":null,"abstract":"Deep neural network is widely used in many intelligent computing scenarios, but the storage overhead caused by its huge number of parameters limits its application in embedded systems. The neural network pruning technology can remove a large number of redundant parameters in the network, theoretically, which reduces the amount of parameter storage and calculation. However, there are two problems in deploying sparse neural network algorithms on existing embedded hardware platforms: one is that using the existing coding to store sparse neural network parameters will increase additional memory access overhead; the other is the decoding process of sparse model is highly complex, which increases the amount of calculation. To solve the two problems, this paper proposes Dynamic ELL encoding format to represent the sparse weight matrix for pruning neural network. This new coding combines the advantages of the ELL encoding and relative indexing, which compromises the storage overhead and decoding overhead. The sparse MLP that uses dynamic ELL encoding storage is deployed on a SoC built on a RISC-V processor. Experiments show that compared to dense networks that use Dynamic ELL encoding storage, the sparse network storage space is reduced by 43%, and the system running time is reduced by 37%.","PeriodicalId":316980,"journal":{"name":"2021 IEEE 4th International Conference on Electronics Technology (ICET)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Electronics Technology (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET51757.2021.9451109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep neural network is widely used in many intelligent computing scenarios, but the storage overhead caused by its huge number of parameters limits its application in embedded systems. The neural network pruning technology can remove a large number of redundant parameters in the network, theoretically, which reduces the amount of parameter storage and calculation. However, there are two problems in deploying sparse neural network algorithms on existing embedded hardware platforms: one is that using the existing coding to store sparse neural network parameters will increase additional memory access overhead; the other is the decoding process of sparse model is highly complex, which increases the amount of calculation. To solve the two problems, this paper proposes Dynamic ELL encoding format to represent the sparse weight matrix for pruning neural network. This new coding combines the advantages of the ELL encoding and relative indexing, which compromises the storage overhead and decoding overhead. The sparse MLP that uses dynamic ELL encoding storage is deployed on a SoC built on a RISC-V processor. Experiments show that compared to dense networks that use Dynamic ELL encoding storage, the sparse network storage space is reduced by 43%, and the system running time is reduced by 37%.