{"title":"基于fpga的LSTM加速器的最新进展","authors":"Kriti Suneja, Aniket Chaudhary, Arun Kumar, Ayush Srivastava","doi":"10.1109/ICONAT53423.2022.9726002","DOIUrl":null,"url":null,"abstract":"In this paper, a review of execution of Long Short-Term Memory Accelerator on FPGA is presented. The paper starts with a brief overview on the significance of LSTM accelerators, historical background and general structure of LSTM accelerators. Then recent works in this area are highlighted. The research in this domain is focused on modifying the architecture of LSTM to overlap the computation of different stages in the network. This yields less power and higher performance as compared to the corresponding software implementation. The implementation on FPGA provides user with the ability to tradeoff resource utilization and speed of operation by adjusting the amount of parallelization for different operation such as vector multiplication and addition. This offers flexibility to adjust the computational load for different FPGA boards.","PeriodicalId":377501,"journal":{"name":"2022 International Conference for Advancement in Technology (ICONAT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Recent Advancements in FPGA-based LSTM Accelerator\",\"authors\":\"Kriti Suneja, Aniket Chaudhary, Arun Kumar, Ayush Srivastava\",\"doi\":\"10.1109/ICONAT53423.2022.9726002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a review of execution of Long Short-Term Memory Accelerator on FPGA is presented. The paper starts with a brief overview on the significance of LSTM accelerators, historical background and general structure of LSTM accelerators. Then recent works in this area are highlighted. The research in this domain is focused on modifying the architecture of LSTM to overlap the computation of different stages in the network. This yields less power and higher performance as compared to the corresponding software implementation. The implementation on FPGA provides user with the ability to tradeoff resource utilization and speed of operation by adjusting the amount of parallelization for different operation such as vector multiplication and addition. This offers flexibility to adjust the computational load for different FPGA boards.\",\"PeriodicalId\":377501,\"journal\":{\"name\":\"2022 International Conference for Advancement in Technology (ICONAT)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference for Advancement in Technology (ICONAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICONAT53423.2022.9726002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference for Advancement in Technology (ICONAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICONAT53423.2022.9726002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recent Advancements in FPGA-based LSTM Accelerator
In this paper, a review of execution of Long Short-Term Memory Accelerator on FPGA is presented. The paper starts with a brief overview on the significance of LSTM accelerators, historical background and general structure of LSTM accelerators. Then recent works in this area are highlighted. The research in this domain is focused on modifying the architecture of LSTM to overlap the computation of different stages in the network. This yields less power and higher performance as compared to the corresponding software implementation. The implementation on FPGA provides user with the ability to tradeoff resource utilization and speed of operation by adjusting the amount of parallelization for different operation such as vector multiplication and addition. This offers flexibility to adjust the computational load for different FPGA boards.