{"title":"基于fpga的LSTM加速实时脑电信号处理","authors":"Zhe Chen, Andrew G. Howe, H. T. Blair, J. Cong","doi":"10.1145/3174243.3174969","DOIUrl":null,"url":null,"abstract":"Closed-loop neurofeedback is a growing area of research and development for novel therapies to treat brain disorders. A neurofeedback device can detect disease symptoms (such as motor tremors or seizures) in real time from electroencephalogram (EEG) signals, and respond by rapidly delivering neurofeedback stimulation that relieves these symptoms. Conventional EEG processing algorithms rely on acausal filters, which impose delays that can exceed the short feedback latency required for closed-loop stimulation. In this paper, we first introduce a method for causal filtering using long short-term memory (LSTM) networks, which radically reduces the filtering latency. We then propose a reconfigurable architecture that supports time-division multiplexing of LSTM inference engines on a prototype neurofeedback device. We implemented a 128-channel EEG signal processing design on a Zynq-7030 device, and demonstrated its feasibility. Then, we further scaled up the design onto Zynq-7045 and Virtex-690t devices to achieve high performance and energy efficient implementations for massively parallel brain signal processing. We evaluated the performance against optimized implementations on CPU and GPU at the same CMOS technology node. Experiment results show that the Virtex-690t can achieve 1.32x and 11x speed-up against the K40c GPU and the multi-thread Xeon E5-2860 CPU, respectively, while FPGA achieves 6.1x and 26.6x energy efficiency compared to the GPU and CPU.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"FPGA-based LSTM Acceleration for Real-Time EEG Signal Processing: (Abstract Only)\",\"authors\":\"Zhe Chen, Andrew G. Howe, H. T. Blair, J. Cong\",\"doi\":\"10.1145/3174243.3174969\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Closed-loop neurofeedback is a growing area of research and development for novel therapies to treat brain disorders. A neurofeedback device can detect disease symptoms (such as motor tremors or seizures) in real time from electroencephalogram (EEG) signals, and respond by rapidly delivering neurofeedback stimulation that relieves these symptoms. Conventional EEG processing algorithms rely on acausal filters, which impose delays that can exceed the short feedback latency required for closed-loop stimulation. In this paper, we first introduce a method for causal filtering using long short-term memory (LSTM) networks, which radically reduces the filtering latency. We then propose a reconfigurable architecture that supports time-division multiplexing of LSTM inference engines on a prototype neurofeedback device. We implemented a 128-channel EEG signal processing design on a Zynq-7030 device, and demonstrated its feasibility. Then, we further scaled up the design onto Zynq-7045 and Virtex-690t devices to achieve high performance and energy efficient implementations for massively parallel brain signal processing. We evaluated the performance against optimized implementations on CPU and GPU at the same CMOS technology node. Experiment results show that the Virtex-690t can achieve 1.32x and 11x speed-up against the K40c GPU and the multi-thread Xeon E5-2860 CPU, respectively, while FPGA achieves 6.1x and 26.6x energy efficiency compared to the GPU and CPU.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174969\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA-based LSTM Acceleration for Real-Time EEG Signal Processing: (Abstract Only)
Closed-loop neurofeedback is a growing area of research and development for novel therapies to treat brain disorders. A neurofeedback device can detect disease symptoms (such as motor tremors or seizures) in real time from electroencephalogram (EEG) signals, and respond by rapidly delivering neurofeedback stimulation that relieves these symptoms. Conventional EEG processing algorithms rely on acausal filters, which impose delays that can exceed the short feedback latency required for closed-loop stimulation. In this paper, we first introduce a method for causal filtering using long short-term memory (LSTM) networks, which radically reduces the filtering latency. We then propose a reconfigurable architecture that supports time-division multiplexing of LSTM inference engines on a prototype neurofeedback device. We implemented a 128-channel EEG signal processing design on a Zynq-7030 device, and demonstrated its feasibility. Then, we further scaled up the design onto Zynq-7045 and Virtex-690t devices to achieve high performance and energy efficient implementations for massively parallel brain signal processing. We evaluated the performance against optimized implementations on CPU and GPU at the same CMOS technology node. Experiment results show that the Virtex-690t can achieve 1.32x and 11x speed-up against the K40c GPU and the multi-thread Xeon E5-2860 CPU, respectively, while FPGA achieves 6.1x and 26.6x energy efficiency compared to the GPU and CPU.