Qin Li, Peiyan Dong, Zijie Yu, Changlu Liu, F. Qiao, Yanzhi Wang, Huazhong Yang
{"title":"突破内存墙:面向ASR应用的近似内存网络压缩联合优化","authors":"Qin Li, Peiyan Dong, Zijie Yu, Changlu Liu, F. Qiao, Yanzhi Wang, Huazhong Yang","doi":"10.1145/3394885.3431512","DOIUrl":null,"url":null,"abstract":"The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and fine-grained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40× memory saving with a phone error rate of 20%.","PeriodicalId":186307,"journal":{"name":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application\",\"authors\":\"Qin Li, Peiyan Dong, Zijie Yu, Changlu Liu, F. Qiao, Yanzhi Wang, Huazhong Yang\",\"doi\":\"10.1145/3394885.3431512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and fine-grained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40× memory saving with a phone error rate of 20%.\",\"PeriodicalId\":186307,\"journal\":{\"name\":\"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3394885.3431512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394885.3431512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application
The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and fine-grained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40× memory saving with a phone error rate of 20%.