{"title":"Multiplexer-Majority Chains: Managing Correlation and Cost in Stochastic Number Generation","authors":"T. Baker, Owen Hoffend, J. Hayes","doi":"10.1145/3565478.3572326","DOIUrl":"https://doi.org/10.1145/3565478.3572326","url":null,"abstract":"High-cost stochastic number generators (SNGs) are the main source of stochastic numbers (SNs) in stochastic computing. Interacting SNs must usually be uncorrelated for satisfactory results, but deliberate correlation can sometimes dramatically reduce area and/or improve accuracy. However, very little is known about the correlation behavior of SNGs. In this work, a core SNG component, its probability conversion circuit (PCC), is analyzed to reveal important tradeoffs between area, correlation, and accuracy. We show that PCCs of the weighted binary generator (WBG) type cannot consistently generate correlated bitstreams, which leads to inaccurate outputs for some designs. In contrast, comparator-based PCCs (CMPs) can generate highly correlated bitstreams but are about twice as large as WBGs. To overcome these area-correlation limitations, a novel class of PCCs called multiplexer majority chains (MMCs) is introduced. Some MMCs are area efficient like WBGs but can generate highly correlated SNs like CMPs and can reduce the area of a filtering circuit by 30% while sacrificing only 7% accuracy. The large influence of PCC design on circuit area and accuracy is explored and suggestions are made for selecting the best PCC based on a target system's correlation requirements.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129643471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Stochastic Convolution Accelerator based on Pseudo-Sobol Sequences","authors":"Aokun Hu, Wenjie Li, Dongxu Lv, Guanghui He","doi":"10.1145/3565478.3572543","DOIUrl":"https://doi.org/10.1145/3565478.3572543","url":null,"abstract":"Stochastic computing (SC) has been recognized as an efficient technique to reduce the hardware consumption of a convolution neural network (CNN) accelerator. An SC-CNN needs a long SC sequence length to produce accurate results, which leads to a low throughput. In order to achieve better accuracy and higher throughput, highly parallelized SC-CNNs based on Sobol sequences have been extensively used. However, high parallelism leads to undesirable hardware overhead. To solve this problem, this paper proposes Pseudo-Sobol sequences and accordingly develops an efficient parallel computation-conversion hybrid convolution architecture, which fuses the SC-computation units and S2B units. With negligible accuracy loss, the proposed architecture can increase energy and area efficiency by 41% and 36%, respectively.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Optimization of Randomizer and Computing Core for Low-Cost Stochastic Circuits","authors":"Kuncai Zhong, Xuan Wang, Chen Wang, Weikang Qian","doi":"10.1145/3565478.3572540","DOIUrl":"https://doi.org/10.1145/3565478.3572540","url":null,"abstract":"Stochastic computing (SC) is an unconventional computing paradigm that computes on stochastic bit streams. It is promising to implement complex functions with low-cost circuitry. A stochastic circuit typically consists of a randomizer to generate the stochastic bit streams and an SC core computing on the bit streams. To design a low-cost stochastic circuit, many works have been proposed to optimize these two parts. However, the works optimize them insufficiently due to the overlook of some optimization space and separately without considering their mutual influence, thus causing the final stochastic circuit sub-optimal. In this work, to address this issue, we first introduce a low-cost randomizer architecture and a method for optimizing the SC core. Then, by combining these two techniques together, we further propose a method to jointly optimize the randomizer and the SC core. Our experimental results show that compared to the conventional method, the proposed joint optimization method can reduce 39.70% area and 42.74% power for the stochastic circuit.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115366650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeuroSOFM-Classifier: A Low Power Classifier Using Continuous Real-Time Unsupervised Clustering","authors":"Siddharth Barve, R. Jha","doi":"10.1145/3565478.3572532","DOIUrl":"https://doi.org/10.1145/3565478.3572532","url":null,"abstract":"Supervised machine learning techniques are becoming subject of significant interest in data analysis. However, the high memory bandwidth requirement of current implementations and scarcity of labeled data in many applications prevents implementation of supervised machine learning techniques. In this work, we propose a neuromorphic architecture implementing the self-organizing feature map algorithm using ferroelectric field-effect transistors (Fe-FETs) and gated-resistive random-access memory (gated-RRAM) to produce a semi-supervised NeuroSOFM-Classifier. A best matching input (BMI) identifier circuit allows for very few labeled samples to be used to provide supervised class labels for each neuron in the NeuroSOFM-Classifier. The best matching unit (BMU) or neuron for consequent samples can then be used to inference or classify the new data. This NeuroSOFM-Classifier, trained on just 1% of the labeled data, is capable of classifying COVID-19 patient chest x-rays with 96% accuracy.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115022769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N.-T. Phan, L. Soumah, Ahmed Sidi El Valli, L. Hutin, Lorena Anghel, U. Ebels, P. Talatchian
{"title":"Electrical Coupling of Perpendicular Superparamagnetic Tunnel Junctions for Probabilistic Computing","authors":"N.-T. Phan, L. Soumah, Ahmed Sidi El Valli, L. Hutin, Lorena Anghel, U. Ebels, P. Talatchian","doi":"10.1145/3565478.3572528","DOIUrl":"https://doi.org/10.1145/3565478.3572528","url":null,"abstract":"Compact and energy-efficient computing systems may advantageously harness nanoscale sources of randomness, such as superparamagnetic tunnel junctions (SMTJs). The collective behavior resulting from the coupling between such SMTJs could be helpful in the hardware implementation of cognitive computing systems where randomness is a low-cost way to encode and explore available information states. Using a simple linear circuit, we mutually couple two such perpendicular SMTJs through the stochastic jumps of their binary resistive states. This approach led to the largest mutual SMTJ coupling strength reported in the literature at this stage. This first demonstration opens a promising path for implementing larger networks of coupled SMTJs that, using simple connectivity schemes, could emulate energy-based models such as Boltzmann and Ising machines or stochastic-based brain-inspired neural networks. In the case of SMTJs, thermal fluctuations at room temperature are the source of randomness that makes the magnetization switch randomly between two states, leading to random changes in the voltages across the two SMTJs. As a result of this voltage change, the magnetization switching probability of coupled SMTJs is, in turn, modified. Using this mechanism, we found a nearly 36 % cross-correlation between the states of the two coupled nanodevices. We use a generalized Néel-Brown model applied to individual SMTJs reproducing the positive (attractive) coupling strength of the coupled SMTJs with a four-state Markov model. Based on this model, we predict the external conditions (applied magnetic field, electrical current) and SMTJ features needed to obtain negative (repulsive) coupling strength.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Rallis, P. Dimitrakis, G. Sirakoulis, A. Rubio, I. Karafyllidis
{"title":"Current Characteristics of Defective GNR Nanoelectronic Devices","authors":"K. Rallis, P. Dimitrakis, G. Sirakoulis, A. Rubio, I. Karafyllidis","doi":"10.1145/3565478.3572538","DOIUrl":"https://doi.org/10.1145/3565478.3572538","url":null,"abstract":"The most promising Graphene structures for the development of nanoelectronics and sensor applications are Graphene nanoribbons (GNRs). GNRs with perfect lattices have been extensively investigated in the research literature; however, fabricated GNRs may still suffering from lattice flaws, the possible effect of which, on the operation of the circuitry comprised by GNR based devices, has not attracted significant interest. In this paper, we investigate the effect of lattice defects on the operational behavior of GNRs using the Non-Equilibrium Green's function (NEGF) method combined with tight-binding Hamiltonians targeting to the resulting nanoelectronic devices and circuits functionalities. We focus on butterfly-shaped GNRs, which have been proven to successfully function as switches that can be used as building blocks for simple Boolean gates and logic circuits. Analyses of the most common defects, namely the single and double vacancies, have been adequately performed. The effect of these vacancies was investigated by inserting them in various places and concentrations on the corresponding GNR based nano-devices. The computation results indicate the effect on lattice defects on the important operational device parameters including the leakage current, ION/IOFF and, finally, current density, which will determine the viability of GNR computing circuits.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121386902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A monolithic 3D design technology co-optimization with back-end-of-line oxide channel transistor","authors":"Jungyoun Kwak, Gihun Choe, Shimeng Yu","doi":"10.1145/3565478.3572312","DOIUrl":"https://doi.org/10.1145/3565478.3572312","url":null,"abstract":"Back-end-of-line (BEOL) compatible tungsten doped indium oxide (IWO) n-type channel transistor is proposed to achieve complementary logic operation with front-end-of-line (FEOL) p-type silicon transistor. To make the fully logic-voltage compatible, a novel stacked nanosheet structure of IWO transistor is designed to achieve high on-current density (Ion > 544 μA/μm) at VGS=1 V to compensate the relative low mobility in semiconducting oxide (~20 cm2/Vs). We demonstrate its performance using Technology Computer-Aided Design (TCAD). For design-technology co-optimization of IWO transistors, a customized monolithic 3D (M3D) process design kit (PDK) and related standard cell library using transistor-level partition are developed to investigate the trade-offs in power, performance, and area (PPA) in representative logic circuit designs such as Advanced encryption standard (AES), triple data encryption algorithm (DES3), and low-density parity-check (LDPC) circuits. The synthesis and simulation results show the M3D design could achieve an average of 35% area reduction under similar energy-delay-product (EDP).","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122653092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Henderson, C. Yakopcic, Steven Harbour, Tarek Taha, Cory E. Merkel, Hananel Hazan
{"title":"Circuit Optimization Techniques for Efficient Ex-Situ Training of Robust Memristor Based Liquid State Machine","authors":"Alex Henderson, C. Yakopcic, Steven Harbour, Tarek Taha, Cory E. Merkel, Hananel Hazan","doi":"10.1145/3565478.3572542","DOIUrl":"https://doi.org/10.1145/3565478.3572542","url":null,"abstract":"Spiking neural network hardware offers a high performance, power-efficient and robust platform for the processing of complex data. Many of these systems require supervised learning, which poses a challenge when using gradient-based algorithms due to the discontinuous properties of SNNs. Memristor based hardware can offer gains in portability, power reduction, and throughput efficiency when compared to pure CMOS. This paper proposes a memristor-based spiking liquid state machine (LSM). The inherent dynamics of the LSM permit the use of supervised learning without backpropagation for weight updates. To carry out the design space evaluation of the LSM for optimal hardware performance, several temporal signal classification tasks are performed. It is found that the binary neuron activations in the output layer improve testing accuracy by 3.7% and 5% for classification, while reducing training time. A power and energy analysis of the proposed hardware is presented, resulting in an approximately 50% reduction in power consumption and cycle energy.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"2 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127014512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance STT-MRAM Logic-in-Memory Scheme Utilizing Data Read Features","authors":"Kai Liu, Bi Wu, Haonan Zhu, Weiqiang Liu","doi":"10.1145/3565478.3572322","DOIUrl":"https://doi.org/10.1145/3565478.3572322","url":null,"abstract":"In the Big Data era, enormous amounts of data processing have caused an intolerable 'memory wall' challenge for traditional Von Neumann architectures. Therefore, more advanced Logic-in-memory (LiM) computing architectures are proposed with integrated computing and memory units that reduce data migration. The emerging non-volatile memory STT-MRAM, with its fast access speed, near-zero leakage power consumption and high density is one of the most competitive carriers for LiM architectures. This work introduces the principle of LiM and proposes four basic logic operations (XNOR, XOR, AND and OR) based on STT-MRAM. Incorporating the reading characteristics of STT-MRAM and slight modifications to the peripheral circuitry, these operations achieve significant optimisation in terms of latency and energy consumption. From the experimental results, the proposed scheme can reduce the latency of XOR, AND and OR operations at least by 99.3%, 82.2% and 80.2% compared with the existing design. Also, 500 Monte Carlo samples prove the feasibility and robustness of the proposed scheme.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Computing in Memory Paradigm based on Reconfigurable Spin-Orbit Torque","authors":"Zhongkui Zhang, Chao Wang, Zhaohao Wang","doi":"10.1145/3565478.3572531","DOIUrl":"https://doi.org/10.1145/3565478.3572531","url":null,"abstract":"We proposed a parallel computing in memory paradigm based on reconfigurable spin-orbit torque switching. The proposed paradigm can efficiently perform XNOR operation without complicated steps or modifications to array structure. Compared to traditional design, 50% writing energy and 2× sensing margin can be achieved.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124294344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}