Saransh Gupta, Justin Morris, M. Imani, R. Ramkumar, Jeffrey Yu, Aniket Tiwari, Baris Aksanli, T. Rosing
{"title":"THRIFTY","authors":"Saransh Gupta, Justin Morris, M. Imani, R. Ramkumar, Jeffrey Yu, Aniket Tiwari, Baris Aksanli, T. Rosing","doi":"10.1145/3400302.3415723","DOIUrl":null,"url":null,"abstract":"Hyperdimensional computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors, hypervectors, instead of numbers. HDC replaces several complex learning computations with bitwise and simpler arithmetic operations, resulting in a faster and more energy-efficient learning algorithm. However, it comes at the cost of an increased amount of data to process due to mapping the data into high-dimensional space. While some datasets may nearly fit in the memory, the resulting hypervectors more often than not can't be stored in memory, resulting in long data transfers from storage. In this paper, we propose THRIFTY, an in-storage computing (ISC) solution that performs HDC encoding and training across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. It allows us to split HDC training into sub-components and process them independently. We also present, for the first time, on-chip acceleration for HDC which uses simple low-power digital circuits to implement HDC encoding in Flash planes. This enables us to explore high internal parallelism provided by the flash hierarchy and encode multiple data points in parallel with negligible latency overhead. THRIFTY also implements a single top-level FPGA accelerator, which further processes the data obtained from the chips. We exploit the state-of-the-art INSIDER ISC infrastructure to implement the top-level accelerator and provide software support to THRIFTY. THRIFTY runs HDC training completely in storage while almost entirely hiding the latency of computation. Our evaluation over five popular classification datasets shows that THRIFTY is on average 1612× faster than a CPU-server and 14.4× faster than the state-of-the-art ISC solution, INSIDER for HDC encoding and training.","PeriodicalId":367868,"journal":{"name":"Proceedings of the 39th International Conference on Computer-Aided Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 39th International Conference on Computer-Aided Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400302.3415723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Hyperdimensional computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors, hypervectors, instead of numbers. HDC replaces several complex learning computations with bitwise and simpler arithmetic operations, resulting in a faster and more energy-efficient learning algorithm. However, it comes at the cost of an increased amount of data to process due to mapping the data into high-dimensional space. While some datasets may nearly fit in the memory, the resulting hypervectors more often than not can't be stored in memory, resulting in long data transfers from storage. In this paper, we propose THRIFTY, an in-storage computing (ISC) solution that performs HDC encoding and training across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. It allows us to split HDC training into sub-components and process them independently. We also present, for the first time, on-chip acceleration for HDC which uses simple low-power digital circuits to implement HDC encoding in Flash planes. This enables us to explore high internal parallelism provided by the flash hierarchy and encode multiple data points in parallel with negligible latency overhead. THRIFTY also implements a single top-level FPGA accelerator, which further processes the data obtained from the chips. We exploit the state-of-the-art INSIDER ISC infrastructure to implement the top-level accelerator and provide software support to THRIFTY. THRIFTY runs HDC training completely in storage while almost entirely hiding the latency of computation. Our evaluation over five popular classification datasets shows that THRIFTY is on average 1612× faster than a CPU-server and 14.4× faster than the state-of-the-art ISC solution, INSIDER for HDC encoding and training.