{"title":"FPGA-based component-wise LSTM training accelerator for neural granger causality analysis","authors":"Chuliang Guo , Yufei Chen , Yu Fu","doi":"10.1016/j.neucom.2024.128871","DOIUrl":null,"url":null,"abstract":"<div><div>Component-wise LSTM (cLSTM) constitutes multiple LSTM cells of distinct parameters, which has particular benefits of functional Magnetic Resonance Imaging (fMRI)-based neural Granger causality (NGC) analysis for the human brain. Back-propagation through time training on CPU and GPU suffers from low utilization due to inherent data dependencies within the LSTM cell. Moreover, batch 1 cLSTM training and few weight reuses across input feature maps worsen such a utilization problem. To this end, this study provides an FPGA-based training solution for cLSTM-based NGC analysis. The proposed cLSTM training accelerator identifies different data dependencies in forward and backward paths, and features two key components: (1) a fine-grained pipeline within the LSTM cell that achieves the lowest initial interval, and (2) a coarse-grained pipeline that trains input feature sequences across different LSTM cells in parallel. Experiments on the DAN sub-brain network from the COBRE dataset demonstrate the efficacy of FPGA-based cLSTM training, which achieves microseconds iteration latency compared with milliseconds on general-purpose platforms, <em>e.g.,</em> 465<span><math><mo>×</mo></math></span> and 216<span><math><mo>×</mo></math></span> faster than Intel Core 13900K CPU and Nvidia RTX 2080Ti respectively. To the best of our knowledge, this work is the first to demonstrate LSTM training on FPGA, significantly accelerating the analysis and modeling of complex brain networks, and offering valuable advancements for neuroscience research at the edge.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128871"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224016424","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Component-wise LSTM (cLSTM) constitutes multiple LSTM cells of distinct parameters, which has particular benefits of functional Magnetic Resonance Imaging (fMRI)-based neural Granger causality (NGC) analysis for the human brain. Back-propagation through time training on CPU and GPU suffers from low utilization due to inherent data dependencies within the LSTM cell. Moreover, batch 1 cLSTM training and few weight reuses across input feature maps worsen such a utilization problem. To this end, this study provides an FPGA-based training solution for cLSTM-based NGC analysis. The proposed cLSTM training accelerator identifies different data dependencies in forward and backward paths, and features two key components: (1) a fine-grained pipeline within the LSTM cell that achieves the lowest initial interval, and (2) a coarse-grained pipeline that trains input feature sequences across different LSTM cells in parallel. Experiments on the DAN sub-brain network from the COBRE dataset demonstrate the efficacy of FPGA-based cLSTM training, which achieves microseconds iteration latency compared with milliseconds on general-purpose platforms, e.g., 465 and 216 faster than Intel Core 13900K CPU and Nvidia RTX 2080Ti respectively. To the best of our knowledge, this work is the first to demonstrate LSTM training on FPGA, significantly accelerating the analysis and modeling of complex brain networks, and offering valuable advancements for neuroscience research at the edge.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.