{"title":"Designing reconfigurable large-scale deep learning systems using stochastic computing","authors":"Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, Bo Yuan","doi":"10.1109/ICRC.2016.7738685","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738685","url":null,"abstract":"Deep Learning, as an important branch of machine learning and neural network, is playing an increasingly important role in a number of fields like computer vision, natural language processing, etc. However, large-scale deep learning systems mainly operate in high-performance server clusters, thus restricting the application extensions to personal or mobile devices. The solution proposed in this paper is taking advantage of the fantastic features of stochastic computing methods. Stochastic computing is a type of data representation and processing technique, which uses a binary bit stream to represent a probability number (by counting the number of ones in this bit stream). In the stochastic computing area, some key arithmetic operations such as additions or multiplications can be implemented with very simple components like AND gates or multiplexers, respectively. Thus it provides an immense design space for integrating a large amount of neurons and enabling fully parallel and scalable hardware implementations of large-scale deep learning systems. In this paper, we present a reconfigurable large-scale deep learning system based on stochastic computing technologies, including the design of the neuron, the convolution function, the back-propagation function and some other basic operations. And the network-on-chip technique is also proposed in this paper to achieve the goal of implementing a large-scale hardware system. Our experiments validate the functionality of reconfigurable deep learning systems using stochastic computing, and demonstrate that when the bit streams are set to be 8192 bits, classification of MNIST digits by stochastic computing can perform as low error rate as that by normal arithmetic operations.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123400815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Coninx, P. Bessière, E. Mazer, J. Droulez, R. Laurent, Awais Aslam, J. Lobo
{"title":"Bayesian sensor fusion with fast and low power stochastic circuits","authors":"Alexandre Coninx, P. Bessière, E. Mazer, J. Droulez, R. Laurent, Awais Aslam, J. Lobo","doi":"10.1109/ICRC.2016.7738672","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738672","url":null,"abstract":"As the physical limits of Moore's law are being reached, a research effort is launched to achieve further performance improvements by exploring computation paradigms departing from standard approaches. The BAMBI project (Bottom-up Approaches to Machines dedicated to Bayesian Inference) aims at developing hardware dedicated to probabilistic computation, which extends logic computation realised by boolean gates in current computer chips. Such probabilistic computing devices would allow to solve faster and at a lower energy cost a wide range of Artificial Intelligence applications, especially when decisions need to be taken from incomplete data in an uncertain environment. This paper describes an architecture where very simple operators compute on a time coding of probability values as stochastic signals. Simulation tests and a reconfigurable logic hardware implementation demonstrated the feasibility and performances of the proposed inference machine. Hardware results show this architecture can quickly solve Bayesian sensor fusion problems and is very efficient in terms of energy consumption.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133047576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Fumarola, P. Narayanan, Lucas L. Sanches, Severin Sidler, Junwoo Jang, Kibong Moon, R. Shelby, H. Hwang, G. Burr
{"title":"Accelerating machine learning with Non-Volatile Memory: Exploring device and circuit tradeoffs","authors":"Alessandro Fumarola, P. Narayanan, Lucas L. Sanches, Severin Sidler, Junwoo Jang, Kibong Moon, R. Shelby, H. Hwang, G. Burr","doi":"10.1109/ICRC.2016.7738684","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738684","url":null,"abstract":"Large arrays of the same nonvolatile memories (NVM) being developed for Storage-Class Memory (SCM) - such as Phase Change Memory (PCM) and Resistance RAM (ReRAM) - can also be used in non-Von Neumann neuromorphic computational schemes, with device conductance serving as synaptic “weight.” This allows the all-important multiply-accumulate operation within these algorithms to be performed efficiently at the weight data.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126083835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neuromorphic mixed-signal circuitry for Asynchronous Pulse Processing","authors":"P. Petre, J. Cruz-Albrecht","doi":"10.1109/ICRC.2016.7738686","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738686","url":null,"abstract":"We demonstrate a software reconfigurable mixed-signal Printed Circuit Board (PCB) prototype and a custom mixed-signal Application Specific Integrated Circuit (ASIC) prototype of a cognitive signal processor using neuromorphic methods to perform adaptive nonlinear filtering based real-time wideband signal processing algorithms. The cognitive processor effectively implements a trending computing paradigm called Reservoir Computer (RC). Hardware implementation of the RC is achieved by a novel analog signal processor architecture called the Asynchronous Pulse Processor (APP).","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124169852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Balynsky, D. Gutierrez, H. Chiang, A. Khitun, A. Kozhevnikov, Y. Khivintsev, G. Dudko, Y. Filimonov
{"title":"Parallel data processing with Magnonic Holographic Co-Processor","authors":"M. Balynsky, D. Gutierrez, H. Chiang, A. Khitun, A. Kozhevnikov, Y. Khivintsev, G. Dudko, Y. Filimonov","doi":"10.1109/ICRC.2016.7738708","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738708","url":null,"abstract":"In this work, we present experimental data demonstrating the capabilities of Magnonic Holographic Co-Processor for parallel data processing. It is a type of magnetic logic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltages. We present experimental data obtained for 8-terminal prototype based on Y3Fe2(FeO4)3 structure. The input of the device is provided by the phased array of spin wave generating elements allowing us to produce input phase patterns of an arbitrary form. The obtained data demonstrate the capabilities of Magnonic Holographic Co-Processor for parallel data processing by using spin wave superposition. Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains are also discussed.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132791294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing data movement with approximate computing techniques","authors":"S. Crago, D. Yeung","doi":"10.1109/ICRC.2016.7738675","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738675","url":null,"abstract":"Data movement is the dominant factor that limits performance and efficiency in today's architectures, and we do not expect that to change in future architectures. In this paper, we describe how approximate computing techniques can be applied to communication at the algorithm level, in conventional computer architectures, and in the architectures being explored as we go beyond Moore's Law. We present results that demonstrate potential performance gains and the effect of approximations in traditional computer architectures. We describe how these techniques may be applied to future architectures based on probabilistic, approximate, stochastic, and neuromorphic computing, as well as more conventional heterogeneous and 3D architectures.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Monifi, S. Shahin, F. Vallini, Y. Fainman, M. Rabinovich
{"title":"Brain inspired photonic motif networks","authors":"F. Monifi, S. Shahin, F. Vallini, Y. Fainman, M. Rabinovich","doi":"10.1109/ICRC.2016.7738706","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738706","url":null,"abstract":"Here we present a brain-inspired photonic cognitive motif network. The proposed architecture consists of semiconductor lasers that are coupled through opto-electronic feedbacks. Competitive interaction among photons and carriers in these coupled lasers leads to dynamics similar to that of many brain activities.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127523686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Schabel, Lee Baker, Sumon Dey, Weifu Li, P. Franzon
{"title":"Processor-in-memory support for artificial neural networks","authors":"J. Schabel, Lee Baker, Sumon Dey, Weifu Li, P. Franzon","doi":"10.1109/ICRC.2016.7738697","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738697","url":null,"abstract":"Hardware acceleration of artificial neural network (ANN) processing has potential for supporting applications benefiting from real time and low power operation, such as autonomous vehicles, robotics, recognition and data mining. Most interest in ANNs targets acceleration of deep multi-layered ANNs that can require days of offline training to converge on a desired network behavior. Interest has grown in ANNs capable of supporting unsupervised training, where networks can learn new information from unlabeled data dynamically without the need for offline training. These ANNs require large memories with bandwidths much higher than supported in modern GPGPUs. Custom hardware acceleration and memory co-design holds the potential to provide real-time performance in cases where the performance requirements cannot be met by modern GPGPUs. This work presents a custom processor solution to accelerate two hetero-associative memories (Sparsey and HTM) capable of unsupervised and one-hot learning. This custom processor is implemented as an expandable ASIP built upon a configurable SIMD engine for exploiting parallelism. Functional specialization is implemented utilizing processor-in-memory techniques, which results in up to a 20× speedup and a 2000× reduction in energy per frame compared to a software implementation operating on a dataset for recognition of human actions.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124254870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A recurrent crossbar of memristive nanodevices implements online novelty detection","authors":"C. Bennett, D. Querlioz, Jacques-Olivier Klein","doi":"10.1109/ICRC.2016.7738689","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738689","url":null,"abstract":"An auto-correlation matrix memory (ACMM) system continuously computes the degree to which a presented input is novel or anomalous relative to past examples. Here we demonstrate that such a filter can be efficiently implemented with memristive nanodevices and accompanying CMOS circuitry. Complete (a full crossbar) and incomplete (an array of memristive devices) variants of the proposed nanofabric are electrically detailed and subsequently simulated on a simple sparse input image test meant to gauge the system's responses to transitions. Both systems demonstrate active novelty filtering with a small level of false positives in the presence of noise, but only the complete system reports all transitions successfully (avoids false negative too). While the system is robust to a noisy channel, degradation towards false positives is more likely when nanodevice variability is taken into account as well. In addition to novelty filtering, the proposed system may be a useful building block for larger reservoir or recurrent on-chip learning systems.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital neuromorphic design of a Liquid State Machine for real-time processing","authors":"Anvesh Polepalli, Nicholas Soures, D. Kudithipudi","doi":"10.1109/ICRC.2016.7738687","DOIUrl":"https://doi.org/10.1109/ICRC.2016.7738687","url":null,"abstract":"The Liquid State Machine (LSM) is a form of reservoir computing which emulates the brains capability of processing spatio-temporal data. This type of network generates highly descriptive responses to continuous input streams. The response is then used to extract information about the input stream. A single LSM network can be used as a generic intelligent processor that processes different streams of data (or) on same stream of data to extract different features. The LSM has been shown to perform well in tasks dependent on a systems behavior through time. The LSM's intrinsic memory and its reduced training complexity make it a suitable choice for hardware implementations for spatio-temporal applications. Existing behavioral models of LSM cannot process real time data due to their hardware complexity or inability to deal with real-time data or both. The proposed model focuses on a simple liquid design that exploits spatial locality and is capable of processing real time data. The model is evaluated for EEG seizure detection with an accuracy of 84.2% and for user identification based on walking pattern with an accuracy of 98.4%.","PeriodicalId":387008,"journal":{"name":"2016 IEEE International Conference on Rebooting Computing (ICRC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126739690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}