Jintao Yu, R. Nane, Adib Haron, S. Hamdioui, H. Corporaal, K. Bertels
{"title":"Skeleton-based design and simulation flow for Computation-in-Memory architectures","authors":"Jintao Yu, R. Nane, Adib Haron, S. Hamdioui, H. Corporaal, K. Bertels","doi":"10.1145/2950067.2950071","DOIUrl":"https://doi.org/10.1145/2950067.2950071","url":null,"abstract":"Memristor-based Computation-in-Memory is one of the emerging architectures proposed to deal with Big Data problems. The design of such architectures requires a radically new automatic design flow because the memristor is a passive device that uses resistance to encode its logic value. This paper proposes a design flow for mapping parallel algorithms on the CIM architecture. Algorithms with similar data flow graphs can be mapped on the crossbar using the same template containing scheduling, placement, and routing information; this template is named skeleton. By configuring such a skeleton with different pre-designed circuits, we can build CIM implementations of the corresponding algorithms in that class. This approach does not only map an algorithm on a memristor crossbar, but also gives an estimation of its performance, area, and energy consumption. It also supports user-defined constraints and parallel SystemC simulation. Experimental results demonstrate the feasibility and the potential of the approach.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125046478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Qian, Yanping Gong, Guoxian Huang, Kiarash Ahi, M. Anwar, Lei Wang
{"title":"A memristor-based compressive sensing architecture","authors":"F. Qian, Yanping Gong, Guoxian Huang, Kiarash Ahi, M. Anwar, Lei Wang","doi":"10.1145/2950067.2950081","DOIUrl":"https://doi.org/10.1145/2950067.2950081","url":null,"abstract":"Memristors are considered as one promising candidate for future memory and computing fabrics. However, the design of memristor-based circuits is under a critical challenge of inevitable variations due to non-ideal fabrication processes and the resulted performance uncertainties. This kind of randomness can be utilized in many other applications, such as compressive sensing based data acquisition, which is conducted by a random sensing matrix. Existing compressive sensing systems are usually implemented in digital CMOS circuits, which suffer the problems of high hardware complexity and limited sampling speed. In this paper, we exploit the inherent variations in memristor devices to generate random sensing matrices for compressive sensing and achieve low cost and high performance operations. Simulation results demonstrate the advantages of the proposed memristor-based compressive sensing architecture.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127363625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Wang, Hao Cai, L. Naviner, Jacques-Olivier Klein, Jianlei Yang, Weisheng Zhao
{"title":"A novel circuit design of true random number generator using magnetic tunnel junction","authors":"You Wang, Hao Cai, L. Naviner, Jacques-Olivier Klein, Jianlei Yang, Weisheng Zhao","doi":"10.1145/2950067.2950108","DOIUrl":"https://doi.org/10.1145/2950067.2950108","url":null,"abstract":"Random numbers are widely used in the cryptography and security systems. However, most of the true random number generators (TRNG) which use physical randomness are with high complexity and high power consumption. This paper proposes a new TRNG circuit using magnetic tunnel junction (MTJ). As one of the reliability issues in MTJ based circuit, the stochastic switching behavior provides a perfect physical source of randomness. The functionality of proposed design is validated by transient simulations with 28nm fully depleted silicon-on-insulator (FDSOI) technology and an accurate MTJ compact model. Simulation results show that our design can generate accurate random bitstream stably. The reliability analysis concerning process variation of MTJs and transistors proves the good variability tolerance of our TRNG design. Furthermore, our design can output stable random bitstream around 30 tuning steps.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126074111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Calvet, J. Friedman, D. Querlioz, P. Bessière, J. Droulez
{"title":"Sleep stage classification with stochastic Bayesian inference","authors":"L. Calvet, J. Friedman, D. Querlioz, P. Bessière, J. Droulez","doi":"10.1145/2950067.2950085","DOIUrl":"https://doi.org/10.1145/2950067.2950085","url":null,"abstract":"The design of electronic circuits that can realize Bayesian inference is an important goal for exploiting machine learning in a fast and efficient way. We recently developed a novel architecture based on stochastic computation with Muller C-elements that can realize a circuit level naïve Bayes inference. This technique can be implemented using low power nanodevices exhibiting faults and device variations. Here we show how a more complex classification problem can be transformed into a simple circuit using this framework where an effective classification can be obtained with a minimal amount of information. This suggests that substantially smaller spatial footprints for portable devices could ultimately be achieved.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128973432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved circuit model for all-spin logic","authors":"M. Alawein, H. Fariborzi","doi":"10.1145/2950067.2950075","DOIUrl":"https://doi.org/10.1145/2950067.2950075","url":null,"abstract":"Spintronic devices are prime candidates for Beyond CMOS era due to their potential for low power consumption and high density computation and storage. All-spin logic (ASL) is among the most promising spintronic logic switches. Previous attempts to model ASL in the linear and diffusive regime either neglect the dynamic characteristics of the transport or do not provide a scalable and robust platform for full micromagnetic simulations and inclusion of other effects like spin Hall effect (SHE) and spin-orbit torque (SOT). In this paper, and based on a finite difference scheme, we propose an improved self-consisting magnetization dynamics/time-dependent carrier transport model that captures the main characteristics of ASL devices.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121858363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining a volatile and nonvolatile memristor in artificial synapse to improve learning in Spiking Neural Networks","authors":"Mahyar Shahsavari, Pierre Falez, Pierre Boulet","doi":"10.1145/2950067.2950090","DOIUrl":"https://doi.org/10.1145/2950067.2950090","url":null,"abstract":"With the end of Moore's law in sight, we need new computing architectures to satisfy the increasing demands of big data processing. Neuromorphic architectures are good candidates to low energy computing for recognition and classification tasks. We propose an event-based spiking neural network architecture based on artificial synapses. We introduce a novel synapse box that is able to forget and remember by inspiration from biological synapses. Two different volatile and nonvolatile memristor devices are combined in the synapse box. To evaluate the effectiveness of our proposal, we use system-level simulation in our Neural Network Scalable Spiking Simulator (N2S3) using the MNIST handwritten digit recognition dataset. The first results show better performance of our novel synapse than the traditional nonvolatile artificial synapses.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123221801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Processing Unit for in-memory processing","authors":"Rotem Ben Hur, Shahar Kvatinsky","doi":"10.1145/2950067.2950086","DOIUrl":"https://doi.org/10.1145/2950067.2950086","url":null,"abstract":"Performance and energy of modern computers, usually built as von Neumann machines, are primarily limited by data transfer from the memory to the CPU and vice versa. Only a true non-von Neumann architecture, where data is processed and stored within the same unit can remove this bottleneck. Using emerging non-volatile resistive memory technologies (namely, memristors) enables the development of Memory Processing Unit (MPU) - a novel non-von Neumann architecture. MPU relies on adding computing capabilities to the memristive memory cells without changing the basic memory array structure, and is compatible with existing computing systems. This paper describes the MPU architecture and examines its controller.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123720384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hsin-Pai Cheng, W. Wen, Chang Song, Beiye Liu, Hai Helen Li, Yiran Chen
{"title":"Exploring the optimal learning technique for IBM TrueNorth platform to overcome quantization loss","authors":"Hsin-Pai Cheng, W. Wen, Chang Song, Beiye Liu, Hai Helen Li, Yiran Chen","doi":"10.1145/2950067.2950096","DOIUrl":"https://doi.org/10.1145/2950067.2950096","url":null,"abstract":"As the first large-scale commercial spiking-based neuromorphic computing platform, IBM TrueNorth chip received tremendous attentions in society. However, one of the known issues in TrueNorth design is the limited precision of synaptic weights, each of which can be selected from only four integers. The current workaround is running multiple neural network copies of which the average value of each synaptic weight is close to that in the original network. To improve the computation accuracy and reduce the incurred hardware cost, in this work, we investigate seven different regularization functions in the cost function of the learning process on TrueNorth platform. The hypothesis is that the quantization loss in the mapping from the trained network in floating-point data format to TrueNorth chip with limited integer values shall be minimized if the discrepancy between the trained weight and the quantized weights by optimizing the training process. Our experimental results proved that the proposed techniques considerably improve the computation accuracy of TrueNorth platform and reduce the incurred hardware and performance overheads. Among all the tested methods, L1TEA regularization achieved the best result, say, up to 2.74% accuracy enhancement when deploying MNIST application onto TrueNorth platform.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132495012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power in-memory computing platform with four Terminal magnetic Domain Wall Motion devices","authors":"Deliang Fan","doi":"10.1145/2950067.2950084","DOIUrl":"https://doi.org/10.1145/2950067.2950084","url":null,"abstract":"The separation of memory and computing units in current Von-Neumann computer architecture leads to unwanted energy hungry data movement and insufficient memory bandwidth. Developing an energy efficient in-memory computing platform is promising to address such issues. Spintronic devices, utilizing electron spin as state variable for information processing and data storage, have demonstrated non-volatility, low power, zero leakage current and high area density advantages over conventional CMOS technology, which makes it an excellent candidate for future in-memory computing design. In this work, we propose a low power in-memory computing platform using a novel 4-terminal magnetic domain wall motion (4T-DWM) device, in which the proposed 4T-DWM device can be employed as both non-volatile memory cell and in-memory logic. The proposed design leads to the unity of memory and logic. Based on our device-circuit SPICE-level simulation, the proposed memory cell writing energy is one order lower than the standard one transistor one magnetic tunnel junction (MTJ) based memory design with writing speed of 1ns. Compared to state-of-the-art CMOS based full adder, the proposed 4T-DWM device based in-memory full adder consumes 3.2× lower power at 500MHz.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125377377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Correction Code protected Data Processing Units","authors":"N. C. Laurenciu, T. Gupta, V. Savin, S. Cotofana","doi":"10.1145/2950067.2950093","DOIUrl":"https://doi.org/10.1145/2950067.2950093","url":null,"abstract":"The significant uncertainty associated with current nanodevices fabrication and operation, calls for a circuit design paradigm change, which ought to actively embrace the inherently nanodevice unreliability to generate overall circuit architectures able to perform reliable computation. While for data storage units viable solutions exist, Data Processing Units (DPUs) are not amenable to a similar line of reasoning. The typical approach undertaken for fault-tolerant DPUs relies on modular redundancy (e.g., spatial, temporal), which while being effective from an error tolerance perspective, generally involves high area and/or performance impairments. This paper proposes a generic methodology to obtain reliable DPU implementations built with unreliable components by intimately intertwining Error Correcting Codes (ECCs) codecs with the DPU functionality. The ECC protected DPU architecture is derived cluster-wise with area and reliability constraints, by exploiting dependence relations (logical and w.r.t. shared area) between internal signals pertaining to the DPU and the ECC codec. To evaluate the error rate and performance implications, a multitude of test corners were considered (e.g., gate criticality, ECC type and structure, faulty and low complexity decoder, time-space redundancy) for an ECC protected 6-bit adder architecture. Simulation results reveal that the ECC embedding approach can be effective from both error rate and area perspective, for the Pareto designs with performance figures of merit situated in-between consecutive modular redundancy based design corresponding curves. The proposed approach is generic from the coding point of view, scalable, and enables a fine grained control of the DPU desired reliability degree and area overhead.","PeriodicalId":213559,"journal":{"name":"2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}