S. Kulkarni, Sachin Bhat, S. Khasanvis, C. A. Moritz
{"title":"Magneto-Electric Approximate Computational Circuits for Bayesian Inference","authors":"S. Kulkarni, Sachin Bhat, S. Khasanvis, C. A. Moritz","doi":"10.1109/ICRC.2017.8123678","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123678","url":null,"abstract":"Probabilistic graphical models like Bayesian Networks (BNs) are powerful cognitive-computing formalisms, with many similarities to human cognition. These models have a multitude of real-world applications. New emerging-technology based circuit paradigms leveraging physical equivalence e.g., operating directly on probabilities vs. introducing layers of abstraction, have shown promise in raising the performance and overall efficiency of BNs, enabling networks with millions of random variables. While previous BNs of up to 100s of nodes have been shown to require single-digit precision without affecting application outcomes, the significantly larger number of variables requires the computational precision to be scaled to correctly support BN operations. We introduce a new computational circuit fabric based on mixed-signal magneto-electric computations operating with physical equivalence and supporting probabilistic computations with a new approximate circuit style. Precision scaling impacts area at a logarithmic vs. linear scale offering a much lower power and performance cost than in prior directions. Results show 30x area reduction for a 0.001 precision vs. prior direction, while maintaining three orders of magnitude benefits vs. 100-core processor implementations.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"208-209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130714133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison between Single Purpose and Flexible Neuromorphic Processor Designs","authors":"D. Mountain, Mark McLean, Christopher D. Krieger","doi":"10.1109/ICRC.2017.8123641","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123641","url":null,"abstract":"A variety of architectures have been proposed for neuromorphic computing chips, including digital, analog, and memristor based approaches. The application space used to analyze these designs is typically narrow, focused primarily on natural signal processing tasks such as image or audio classification. In this work, we analyze the ability of a memristor-based neuromorphic architecture to perform tasks representative of those done by computer network edge devices. We evaluate the neuromorphic designs running a baseline benchmark (MNIST), an AES-256 encryptor, and a malware detection tool. We evaluate these applications on both single purpose chips and on flexible multipurpose chips configured for the same tasks. Single purpose designs use direct, hardwired connections and custom memristor crossbar sizes, while flexible designs use crossbar arrays of a single standard size and communicate over an on-chip network. The throughput per watt and throughput per area costs associated with increased flexibility are shown to be 1.8x and 8x-10x, respectively.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115033617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Freiberger, A. Katumba, P. Bienstman, J. Dambre
{"title":"On-Chip Passive Photonic Reservoir Computing with Integrated Optical Readout","authors":"Matthias Freiberger, A. Katumba, P. Bienstman, J. Dambre","doi":"10.1109/ICRC.2017.8123673","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123673","url":null,"abstract":"Photonic reservoir computing is a recent bio- inspired paradigm for signal processing. Despite first successes, the paradigm still faces challenges. We address some of these challenges and introduce our approaches to solve them. In detail, we discuss how integrated reservoirs can be scaled up by injecting multiple copies of the input. Further we introduce a new hardware-friendly training method for integrated optical readouts.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115098291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raphael Frisch, R. Laurent, M. Faix, Laurent Girin, L. Fesquet, A. Lux, J. Droulez, P. Bessière, E. Mazer
{"title":"A Bayesian Stochastic Machine for Sound Source Localization","authors":"Raphael Frisch, R. Laurent, M. Faix, Laurent Girin, L. Fesquet, A. Lux, J. Droulez, P. Bessière, E. Mazer","doi":"10.1109/ICRC.2017.8123681","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123681","url":null,"abstract":"Compared to conventional processors, stochastic computing architectures have strong potential to speed up computation time and to reduce power consumption. We present such an architecture, called Bayesian Machine (BM), dedicated to solving Bayesian inference problems. Given a set of noisy signals provided by low-level sensors, a BM estimates the posterior probability distribution of an unknown target information. In the present study, a BM is used to solve a sound source localization (SSL) problem: the BM computes the probability distribution of the position of a sound source given acoustic signals captured by a set of microphones. Assuming free field wave propagation (no reverberations), we express the SSL problem as the maximization of a likelihood function fed with audio features provided by the time-frequency (TF) analysis of the captured audio waves. The proposed BM uses bitwise parallel sampling to fuse the resulting multi-channel information. As the number of channels to fuse is large, the standard BM architecture encounters the so-called ``time dilution problem\" (long delays are necessary to obtain valid samples). We tackle this problem by using max-normalization of the distributions combined with a periodic re-sampling of the bit streams after processing a reasonably small subset of evidences. Finally, we compare the localization performance of the proposed machine with the results obtained using a standard version of the machine. The re-sampling leads to an impressive acceleration factor of 10³ in the computation.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122850744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Schneider, C. Donnelly, S. Russek, B. Baek, M. Pufall, P. Hopkins, W. Rippard
{"title":"Energy-Efficient Single-Flux-Quantum Based Neuromorphic Computing","authors":"M. Schneider, C. Donnelly, S. Russek, B. Baek, M. Pufall, P. Hopkins, W. Rippard","doi":"10.1109/ICRC.2017.8123634","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123634","url":null,"abstract":"Recent experimental work has demonstrated nano- textured magnetic Josephson junctions (MJJs) that exhibit tunable spiking behavior with ultra-low training energies in the attojoule range. MJJ devices integrated with standard single-flux-quantum neural systems form a new class of neuromorphic technologies that have spiking energies between attojoules and zeptojoules, operation frequencies up to 100 GHz, and nanoscale plasticity. Here, we present the design of neural cells utilizing MJJs that form the basic elements in multilayer perception and convolutional networks. We present SPICE models, using experimentally derived Verilog A models for MJJs, to assess the performance of these cells in simple neural network structures. Modeling results indicate that the tunable Josephson critical current IC can function as a weight in a neural network. Using SPICE we model a fully connected two layer network with 9 inputs and 3 outputs.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128902180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure Discovery for Gene Expression Networks with Emerging Stochastic Hardware","authors":"S. Kulkarni, Sachin Bhat, C. A. Moritz","doi":"10.1109/ICRC.2017.8123688","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123688","url":null,"abstract":"Gene Expression Networks (GENs) attempt to model how genetic information stored in the DNA (Genotype) results in the synthesis of proteins, and consequently, the physical traits of an organism (Phenotype). Deciphering GENs plays an important role in a wide range of applications from genetic studies of the origins of life to personalized healthcare. Probabilistic graphical models such as Bayesian Networks (BNs) are used to perform learning and inference of GENs from genetic data. Current techniques of generating BNs of GENs from data, which are mostly approximate in nature, involve searching and scoring of multiple probabilistic graphical structures. However, while search algorithms can be efficiently implemented in software, the same is not true for scoring. Scoring of probabilistic models with inherent parallelism is inefficient when performed sequentially over conventional architectures comprising of deterministic devices. In this paper, we introduce a new nanoscale hardware acceleration framework, enabling fast and efficient Bayesian inference operations, significantly accelerating the scoring aspect of the BN learning of GENs using a combination of emerging stochastic devices and CMOS technology. The stochasticity of the devices is utilized to efficiently perform approximate inference on probabilistic networks, and the circuit framework constituting these devices is designed to exploit the inherent parallelism in these models. We demonstrate approximate inference operation over a small BN. We estimate the performance benefits of five orders of magnitude in performing inference operations using this architecture over software-only approaches.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122471768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Barber, C. Harrington, K. Rupavatharam, C. Thiel, T. Jackson, P. Sellin, C. Benko, K. Merkel
{"title":"Spatial-Spectral Materials for High Performance Optical Processing","authors":"Z. Barber, C. Harrington, K. Rupavatharam, C. Thiel, T. Jackson, P. Sellin, C. Benko, K. Merkel","doi":"10.1109/ICRC.2017.8123674","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123674","url":null,"abstract":"Optically active Spatial-Spectral (S2) materials are a unique resource for spectrally based optical memory and processing. At cryogenic temperatures, the rare-earth ions in these materials individually exhibit narrow optical resonances on the order of MHz to sub-KHz, but are inhomogeneously broadened over GHz to THz spectral bandwidths providing up to 10^7 resolvable spectral channels. The material can be optically programmed to transform the spectral and spatial components of optical signals to perform signal processing operations such analog multiplications, time delay, filtering, convolutions, and correlations. We present work utilizing the S2 technology for high bandwidth (>32 GHz) and data rate selection and filtering, including processing of 1D data streams in real-time and 2D images. Despite cryogenic cooling, the power efficiency of the S2 technology compares favorably to CMOS in large scale systems. Finally, potential architectures for large (10^6 x 10^6) vector-matrix multipliers using S2 materials are discussed.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116241564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Approach for Multi-Valued Computing Using Machine Learning","authors":"Wafi Danesh, Mostafizur Rahman","doi":"10.1109/ICRC.2017.8123646","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123646","url":null,"abstract":"To continue scaling in future and to meet emerging application requirements, revolutionary concepts are required not just on \"how can we find better switches to implement computers\", but also on \"how computers compute logic\". Multi-Valued Logic (MVL) provides one such opportunity, since efficient implementation of MVL can allow compact and enhanced information processing and be orders of magnitude efficient than binary CMOS. Emerging devices such as Quantum Dots, Magnetic Tunnel Junctions (MTJs), Carbon Nanotube FETs (CNTFETs), Spin Wave Devices etc., provide an avenue for hardware implementation of MVL. So far, MVL lagged behind CMOS due to (a) complexities associated with traditional logic decomposition approaches, which result in many MVL minterms, complex polynomials of order greater than two, and cumbersome decision diagrams that are difficult to implement, and (b) multi-valued data representation, processing and communication using binary switches and medium, which are inefficient. In this paper, we propose a transformative new direction for multi-valued logic decomposition (problem (a)) utilizing concepts from machine learning and nanoelectronics. In our approach, given a MVL function, we perform iterative linear regressions on all input and output combinations to derive a set of linear expressions. A visual pattern matching technique is then applied to derive selection conditions for each linear expression based on the function inputs. The set of linear expressions and corresponding selection criteria ensure that for all function inputs, correct outputs are obtained. The resulting linear expressions are hardware implementation friendly and can be implemented with simple gates and summation functions. In this paper, we show an approach to solve (b) by implementing a quaternary multiplier using our technique. Our detailed comparison with existing methods suggests huge benefits can be attained with the proposed method.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127226416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Securing Data Centers, Handheld Computers, and Networked Sensors against Viruses and Rootkits","authors":"Earle Jennings","doi":"10.1109/ICRC.2017.8123687","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123687","url":null,"abstract":"Today's data centers, their handheld computers and network sensors, are discussed in terms of how they are penetrated by viruses and rootkits. This paper then presents a new computer architecture, implemented to be semantically compatible with an existing microprocessor, along with modification of several system components commonly found in data centers. The new computer architecture physically separates instruction memories from data-related memories removing the possibility of installing viruses and rootkits. Application compatibility is insured by the semantic compatibility of the cores with the existing superscalar microprocessor. Communications, memory controllers, and memory devices throughout the data center, handheld computers and network sensors physically segregate task-instruction information from data-related information to further remove any opportunity for these hidden threats becoming installed threats.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115780761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Packaging and Heterogeneous Integration to Reboot Computing","authors":"Saptadeep Pal, S. Iyer, Puneet Gupta","doi":"10.1109/ICRC.2017.8123637","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123637","url":null,"abstract":"In the past several decades on-chip dimensions have scaled over 2000X, while dimensions on printed circuit board have scaled 4-5X. This modest scaling of packaging dimensions has severely limited system scaling. To address this, we have proposed a disruptive package-free integration scheme. We replace the traditional organic printed circuit board (PCB) with silicon interconnect fabric (SiIF) and replace the traditional package by directly mounting bare chiplets on to the SiIF. Fine pitch solderless copper pillar connections increase IO density by 20-80X and the inter-chiplet spacing is reduced by 10-20X. This enables highly parallel communication instead of serialized links. This achieves higher bandwidth/mm (~100X) and lower latency (~25X) and lower communication energy per bit (~200X). This integration technology allows us to challenge the conventional communication-limited architectures in a substantial way. The ability to heterogeneously integrate diverse dies with arbitrarily fine granularity, but on a wafer scale, reduces the cost of processor-memory communication energy opening new compute paradigms. In addition, the superior heat spreading properties of the SiIF compared to organic PCBs allows us to run the cores harder. The heterogeneous integration property of our scheme, allows for an intimate mingling of heterogeneous processor cores, FPGAs and memory types opening new avenues to reboot computing.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121311332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}