{"title":"Heterogeneous Technology Configurable Fabrics for Field-Programmable Co-Design of CMOS and Spin-Based Devices","authors":"R. Demara, A. Roohi, Ramtin Zand, Steven D. Pyle","doi":"10.1109/ICRC.2017.8123638","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123638","url":null,"abstract":"The architecture, operation, and characteristics of two post-CMOS reconfigurable fabrics are identified to realize energy-sparing and resilience features, while remaining feasible for near-term fabrication. First, Storage Cell Replacement Fabrics (SCRFs) provide a reconfigurable computing platform utilizing near- zero leakage Spin Hall Effect devices which replace SRAM bit-cells within Look-Up Tables (LUTs) and/or switch boxes to complement the advantages of MOS transistor-based multiplexer select trees. Second, Heterogeneous Technology Configurable Fabrics (HTCFs) are identified to extend reconfigurable computing platforms via a palette of CMOS, spin-based, or other emerging device technologies, such as various Magnetic Tunnel Junction (MTJ) and Domain Wall Motion devices. HTCFs are composed of a triad of Emerging Device Blocks, CMOS Logic Blocks, and Signal Conversion Blocks. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HTCFs enable new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. Both SCRFs and HTCFs offer a platform for fine- grained Logic-In-Memory architectures and runtime adaptive hardware. SPICE simulations indicate 6% to 67% reduction in read energy, 21% reduction in reconfiguration energy, and 78% higher clock frequency versus alternative fabricated emerging device architectures, and a significant reduction in leakage compared to CMOS-based approaches.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128033801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Physical Underpinnings of the Unusual Effectiveness of Probabilistic and Neural Computation","authors":"S. Tiwari, D. Querlioz","doi":"10.1109/ICRC.2017.8123680","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123680","url":null,"abstract":"Probabilistic and neural approaches, through their incorporation of nonlinearities and compression of states, enable a broader sampling of the phase space. For a broad set of complex questions that are encountered in conventional computation, this approach is very effective. In these patterns-oriented tasks a fluctuation in the size of data is akin to a thermal fluctuation. A thermodynamic view naturally applies to this computational style to information processing and from this reasoning one may estimate a variety of interesting consequences for computing: (a) efficiencies in energy, (b) complexity of tasks that can be tackled, (c) inaccuracies in inferences, and (d) limitations arising in the incompleteness of inputs and models. We employ toy model examples to reflect on these important themes to establish the following: (.)A dissipation minimum can be predicted predicated on the averaged information being discarded under constraints of minimization of energy and maximization of information preservation and entropy. Analogous to the $k_{B}T ln 2$ for the randomization of a bit, under biological constraints, the $sim ! -70 ; mV$ base and $sim ! 40 ;mV$ peak spike potential are then a natural consequence in a biological neural environment. Non-biological, that is, physical implementations can be analyzed by a similar approach for noisy and variability-prone thermodynamic setting. (.) In drawing inference, the resorting to Occam's razor as a statistical equivalent to the choice of simplest and least number of axioms in developing of a theory conflicts with Mencken's rule--for every complex problem, there is an answer that is clear, simple and wrong--as a reflection of dimensionality reduction. (.) Between these two factors, it is possible to make a measure of the error bound predicated on the averaged information being discarded and being filled in, and (.) This lets one predict the upper limits of information processing rate under constraints. These observations point to what may be achievable using neural and probabilistic computation through their physical implementation as reflected in the thermodynamics of the implementation of a statistical information mechanic engine that avoids computation via deterministic linear algebra.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133285621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Cryogenic Memory Cells for Superconducting Computing Applications","authors":"J. Yau, Y.-K.-K. Fung, G. W. Gibson","doi":"10.1109/ICRC.2017.8123684","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123684","url":null,"abstract":"We propose a hybrid cryogenic memory architecture comprising of Josephson junction and Toggle MRAM. Comparison with existing cryogenic memory builds suggests that this hybrid build is a viable candidate of memory architecture for superconducting computing applications.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134090199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-mei W. Hwu, I. E. Hajj, Simon Garcia De Gonzalo, Carl Pearson, N. Kim, Deming Chen, Jinjun Xiong, Zehra Sura
{"title":"Rebooting the Data Access Hierarchy of Computing Systems","authors":"Wen-mei W. Hwu, I. E. Hajj, Simon Garcia De Gonzalo, Carl Pearson, N. Kim, Deming Chen, Jinjun Xiong, Zehra Sura","doi":"10.1109/ICRC.2017.8123667","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123667","url":null,"abstract":"We have been experiencing two very important movements in computing. On the one hand, a tremendous amount of resource has been invested into innovative applications such as first-principle-based methods, deep learning and cognitive computing. On the other hand, the industry has been taking a technological path where application performance and energy efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. We envision that a \"perfect storm\" is coming because of the interaction between these two movements. Many of these new and high-valued applications need to touch a very large amount of data with little data reuse and data movement has become the dominating factor for both power and performance of these applications. It will be critical to match the compute throughput to the data access bandwidth and to locate the compute near data. Much has been and continuously needs to be learned about algorithms, languages, compilers and hardware architecture in this movement. What are the killer applications that may become the new driver for future technology development? How hard is it to program existing systems to address the data movement issues today? How will we program these systems in the future? How will innovations in memory devices present further opportunities and challenges in designing new systems? What is the impact on long-term software engineering cost of applications? In this paper, we present some lessons learned as we design the IBM-Illinois C3SR (Center for Cognitive Computing Systems Research) Erudite system inside this perfect storm.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Abel, D. Stark, F. Eltes, J. Ortmann, D. Caimi, J. Fompeyrine
{"title":"Multi-Level Optical Weights in Integrated Circuits","authors":"S. Abel, D. Stark, F. Eltes, J. Ortmann, D. Caimi, J. Fompeyrine","doi":"10.1109/ICRC.2017.8123672","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123672","url":null,"abstract":"We demonstrate multi-level optical weights embedded in a silicon photonic platform based on ferroelectric domain switching. Ferroelectric barium titanate integrated on silicon resonator structures is used as the memory material. By applying short voltage pulses of 100ns, we can switch fractions of the ferroelectric domains and thus change the transmission of the waveguides by more than one order of magnitude in a non-volatile way. We achieve 10 distinct transmission levels, and show iterative switching of the synaptic element based on the polarity, magnitude, and number of applied voltage pulses. Our results are the first experimental demonstration of an electrically driven, multi-level optical memory in integrated photonic circuits. Such non-volatile, ferroelectric weighting element could serve as a key synaptic building block in future photonic neuronal networks.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130482818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing","authors":"M. Imani, Yeseong Kim, T. Simunic","doi":"10.1109/ICRC.2017.8123666","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123666","url":null,"abstract":"The nearest neighbor (NN) algorithm has been used in a broad range of applications including pattern recognition, classification, computer vision, databases, etc. The NN algorithm tests data points to find the nearest data to a query data point. With the Internet of Things the amount of data to search through grows exponentially, so we need to have more efficient NN design. Running NN on multicore processors or on general purpose GPUs has significant energy and performance overhead due to small available cache sizes resulting in moving a lot of data via limited bandwidth busses from memory. In this paper, we propose a nearest neighbor accelerator, called NNgine, consisting of ternary content addressable memory (TCAM) blocks which enable near-data computing. The proposed NNgine overcomes energy and performance bottleneck of traditional computing systems by utilizing multiple non-volatile TCAMs which search for nearest neighbor data in parallel. We evaluate the efficiency of our NNgine design by comparing to existing processor-based approaches. Our results show that NNgine can achieve 5590x higher energy efficiency and 510x speed up compared to the state-of-the-art techniques with a negligible accuracy loss of 0.5%.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121145038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear Dynamics and Chaos for Fleixble, Reconfigurable Computing","authors":"Behnam Kia, W. Ditto","doi":"10.1109/ICRC.2017.8123679","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123679","url":null,"abstract":"Nonlinear dynamics and chaos contribute flexibility and rich, complex behavior to nonlinear systems. Transistors and transistor circuits are inherently nonlinear. It was demonstrated that this nonlinearity and the flexibility that comes with it can be utilized to implement flexible, reconfigurable computing, and such approaches are called Nonlinear Dynamics-Based Computing. In nonlinear dynamics-based computing, a very same circuit can be reprogrammed to implement and perform many different types of computations, thereby increasing the amount of computing that can be obtained per transistor. For example, at the gate level, the same transistor circuit can implement all different logical gates, such as AND gate or XOR gate. Or at the system level, the same transistor circuit can implement a variety of different higher-level functions, such as addition or subtraction. Another remarkable feature of nonlinear dynamics-based computing is that because different types of functions or operations coexist within the dynamics of the circuit, reprograming and reconfiguring is nearly instant. A recently fabricated VLSI chip for nonlinear dynamics-based computing was shown to be capable of implementing a new function in each clock cycle, with no need for separate reprograming time in between clock cycles. In this paper we briefly review this new approach to computing, present some of our latest results, discuss the implications and possible advantages of nonlinear dynamics-based computing, and plot potential horizons for this exciting new approach to computing.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125787450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Plank, G. Rose, Mark E. Dean, Catherine D. Schuman, N. Cady
{"title":"A Unified Hardware/Software Co-Design Framework for Neuromorphic Computing Devices and Applications","authors":"J. Plank, G. Rose, Mark E. Dean, Catherine D. Schuman, N. Cady","doi":"10.1109/ICRC.2017.8123655","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123655","url":null,"abstract":"With the death of Moore's law, the computing community is in a period of exploration, focusing on novel computing devices, paradigms, and techniques for programming. The TENN-Lab group has developed a hardware/software co- design framework for this exploration, on which we perform research with three thrusts: (1) Devices for computing, such as memristors and biomimetic membranes. (2) Applications that employ spiking neural networks for processing. (3) Machine learning techniques to program. The design framework is unified, because it allows all three thrusts to work in concert, so that, for example, new results on device design can apply instantly to the current results of applications and learning. In this paper, we detail the interweaving components of the design framework. We then describe case studies on each of the research thrusts above, highlighting how the unified framework is enabling to each case study.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133581715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Li, C. Monat, S. L. Beux, X. Letartre, I. O’Connor
{"title":"An Energy-Efficient Reconfigurable Nanophotonic Computing Architecture Design: Optical Lookup Table","authors":"Zhen Li, C. Monat, S. L. Beux, X. Letartre, I. O’Connor","doi":"10.1109/ICRC.2017.8123670","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123670","url":null,"abstract":"We present an energy-efficient on-chip reconfigurable computing architecture, the so-called OLUT, which is an optical core implementation of a lookup table. It offers significant improvement with respect to optical directed logic architectures, through allowing the use of wavelength division multiplexing (WDM) for computation parallelism. We performed a design space exploration that elucidates the add-drop filter characteristics needed to produce a computing architecture with high computation reliability (BER~10-18) and low energy consumption. Analytical results demonstrate the potential of the resulting OLUT implementation to reach <100 fJ/bit per logic operation, which may meet future demands for on-chip optical FPGAs.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124209251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Jacobs-Gedrim, S. Agarwal, K. E. Knisely, J. Stevens, M. V. Heukelom, D. Hughart, J. Niroula, C. James, M. Marinella
{"title":"Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator","authors":"R. Jacobs-Gedrim, S. Agarwal, K. E. Knisely, J. Stevens, M. V. Heukelom, D. Hughart, J. Niroula, C. James, M. Marinella","doi":"10.1109/ICRC.2017.8123657","DOIUrl":"https://doi.org/10.1109/ICRC.2017.8123657","url":null,"abstract":"Resistive memory (ReRAM) shows promise for use as an analog synapse element in energy-efficient neural network algorithm accelerators. A particularly important application is the training of neural networks, as this is the most computationally-intensive procedure in using a neural algorithm. However, training a network with analog ReRAM synapses can significantly reduce the accuracy at the algorithm level. In order to assess this degradation, analog properties of ReRAM devices were measured and hand-written digit recognition accuracy was modeled for the training using backpropagation. Bipolar filamentary devices utilizing three material systems were measured and compared: one oxygen vacancy system, Ta-TaOx, and two conducting metallization systems, Cu-SiO2, and Ag/chalcogenide. Analog properties and conductance ranges of the devices are optimized by measuring the response to varying voltage pulse characteristics. Key analog device properties which degrade the accuracy are update linearity and write noise. Write noise may improve as a function of device manufacturing maturity, but write nonlinearity appears relatively consistent among the different device material systems and is found to be the most significant factor affecting accuracy. This suggests that new materials and/or fundamentally different resistive switching mechanisms may be required to improve device linearity and achieve higher algorithm training accuracy.","PeriodicalId":125114,"journal":{"name":"2017 IEEE International Conference on Rebooting Computing (ICRC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130140464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}