{"title":"变压器神经网络的不动点优化","authors":"Yoonho Boo, Wonyong Sung","doi":"10.1109/ICASSP40776.2020.9054724","DOIUrl":null,"url":null,"abstract":"The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"85 1","pages":"1753-1757"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Fixed-Point Optimization of Transformer Neural Network\",\"authors\":\"Yoonho Boo, Wonyong Sung\",\"doi\":\"10.1109/ICASSP40776.2020.9054724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.\",\"PeriodicalId\":13127,\"journal\":{\"name\":\"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"85 1\",\"pages\":\"1753-1757\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP40776.2020.9054724\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP40776.2020.9054724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fixed-Point Optimization of Transformer Neural Network
The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.