{"title":"Exploiting the Third Dimension: Stackable Quantum-dot Cellular Automata","authors":"Willem Lambooy, Marcel Walter, R. Wille","doi":"10.1145/3565478.3572529","DOIUrl":"https://doi.org/10.1145/3565478.3572529","url":null,"abstract":"The exponential growth of transistor density in integrated circuits is doomed to fail at the limits of physics in the foreseeable future. Quantum-dot Cellular Automata (QCA) is a post-CMOS contestant from the emerging Field-coupled Nanocomputing (FCN) paradigm which offers computations with tremendously low power dissipation. Recent physical accomplishments in this area also motivated the developments of corresponding design automation methods. However, although the higher integration density of QCA makes this technology a promising candidate for stacked, i. e. cuboid-like, chip architectures, all design automation solutions proposed thus far are limited to 2-dimensional architectures only. This work showcases the potential when the third dimension is additionally utilized. To this end, we must overcome certain obstacles for which corresponding solutions are proposed. Case studies on important regular structures such as bitwise AND/OR, binary adders, or multiplexers---for which we provide automatic generation scripts---confirm that exploiting the third dimension in this fashion yields a prodigious reduction in area occupation and cell count, differing by several orders of magnitude compared to the state of the art.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125947808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanghang Wang, Ke Chen, Bi Wu, Chenghua Wang, Weiqiang Liu, Fabrizio Lombardi
{"title":"HEADiv: A High-accuracy Energy-efficient Approximate Divider with Error Compensation","authors":"Hanghang Wang, Ke Chen, Bi Wu, Chenghua Wang, Weiqiang Liu, Fabrizio Lombardi","doi":"10.1145/3565478.3572324","DOIUrl":"https://doi.org/10.1145/3565478.3572324","url":null,"abstract":"The circuit complexity of dividers is more considerable than the basic arithmetic units like adders and multipliers. However, the performance of the divider has a significant impact on the system performance, leading to degradation if not appropriately implemented. As a promising design methodology, approximate computing has demonstrated its effectiveness in reducing power consumption and improving performance with good-enough accuracy. This paper proposes an approximate divider HEADiv based on Taylor expansion with error compensation to reduce hardware consumption. The proposed approximate divider is evaluated and analyzed using error and hardware metrics. Compared to other state-of-the-art approximate divider designs, the proposed approximate divider showed 70% and 45% improvement in accuracy for 8-bit and 16-bit dividers, respectively. Besides, the proposed 16-bit approximate divider reduced the area and power consumption by 9% and 42%, respectively. Finally, the experiments illustrate that the proposed approximate divider can improve the PSNR by up to 55% in image processing applications.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122344393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rubaya Absar, Zach D. Merino, H. Elgabra, Xuesong Chen, J. Baugh, Lan Wei
{"title":"Integrated Control Addressing Circuits for a Surface Code Quantum Computer in Silicon","authors":"Rubaya Absar, Zach D. Merino, H. Elgabra, Xuesong Chen, J. Baugh, Lan Wei","doi":"10.1145/3565478.3572541","DOIUrl":"https://doi.org/10.1145/3565478.3572541","url":null,"abstract":"Quantum computers require a coordinated operation on a large number of quantum bits (qubits), presenting considerable obstacles such as system integration on a large scale, individual qubits control with precision, and significant error correction overhead. Silicon (Si) quantum dot (QD) spin qubits paired with CMOS control circuits promise a scalable solution due to its potential for large-scale integration utilizing well-established semiconductor technologies. This paper proposes a control addressing scheme for QD spin qubits operating on a node network architecture. Compared to the typical 2-dimensional array architecture, this approach considerably lowers the area constraint for control signal routing. Scalable circuits are designed to route the control signals for local and global operations of a surface code quantum error correction through the modular design of tiered switches controlled by demultiplexers. The proposed method is a critical step toward implementing scalable solid-state quantum processors.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128216498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An In-memory Booth Multiplier Based on Non-volatile Memory for Neural Network Applications","authors":"Jiayao Wu, Yijiao Wang, Zhi Yang, Kuiqing He, Pengxu Wang, Weisheng Zhao","doi":"10.1145/3565478.3572534","DOIUrl":"https://doi.org/10.1145/3565478.3572534","url":null,"abstract":"Neural network (NN) is one of the most significant methods to accomplish complex targets, which is widely used in image recognition, natural language processing and so on. NN demands tremendous amount of parallel Multiply-and-accumulation (MAC) operations that would affect the speed and power efficiency. Thus, how to accelerate MAC and reduce the power consumption, especially for multiplication, is a critical concern. Perpendicular-anisotropy spin-orbit torque (SOT) magnetic random access memory (MRAM) with spin transfer torque (STT) assisted is leveraged in this work, which is perfect to be used for NN because of its non-volatility, power efficiency and ultrafast operation. In addition, Booth arithmetic is an excellent method to reduce the partial products of the multiplication for acceleration. In this work, an in-memory Booth multiplier based on MRAM is designed and analyzed through simulation. Compared with the in-SRAM counterpart, our design saved 70.4% energy of the decoding part, which shows great improvement.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121400485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Multi-Path Signal Routing for Field-coupled Nanotechnologies","authors":"Marcel Walter, R. Wille","doi":"10.1145/3565478.3572539","DOIUrl":"https://doi.org/10.1145/3565478.3572539","url":null,"abstract":"Establishing itself among the vanguard of beyond-CMOS candidates, Field-coupled Nanocomputing (FCN) has advanced in recent times due to fabrication breakthroughs of Silicon Dangling Bonds (SiDBs). At the foundation of these breakthroughs, experimental demonstrations showcase the feasibility of FCN logic components and wire segment implementations at the physical limits of scaling. However, automatic design methods for this highly-promising technology remain scarce, as they are impeded by the necessity to conform to particular constraints that differ from those in CMOS technologies. Previously proposed approaches are restricted by their inability to overcome scalability limitations and/or their failure to generate results of adequate quality. In this work, we aim to improve this state of the art by addressing the epicenter of performance inadequacy and proposing a distinctive multi-path FCN routing algorithm that is explicitly adjusted to the design constraints dictated by FCN technologies. The resulting approach can be parameterized to generate signal routings for almost arbitrary FCN placements or, in case this is impossible, pinpoint the designer to the unsatisfied connections. Experimental evaluations confirm these abilities on an established benchmark set and demonstrate a runtime advantage of several orders of magnitude over a state-of-the-art physical design algorithm.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130939649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate computation based on NAND-SPIN MRAM for CNN on-chip training","authors":"Zhengyi Hou, Luyao Shi, Bi Wang, Zhaohao Wang","doi":"10.1145/3565478.3572537","DOIUrl":"https://doi.org/10.1145/3565478.3572537","url":null,"abstract":"Approximate computation is a widely used method to accelerate CNN training. In this work, the stochastic switching mechanism of the NAND-SPIN MRAM is utilized to perform the approximate update and storage of the synaptic weight. By reducing the programming time of the NAND-SPIN MTJs from 3 ns to 1 ns, more than 67% speedup and nearly 70% energy saving have been achieved with less than 1% accuracy loss.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115950320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Cycle XOR (SCXOR) and Stateful n-bit Parallel Adder Implementation Using 2D RRAM Crossbar","authors":"Bhanprakash Goswami, M. Suri","doi":"10.1145/3565478.3572329","DOIUrl":"https://doi.org/10.1145/3565478.3572329","url":null,"abstract":"The motivation to find a solution to the Memory Wall problem led the research community to explore non-von-Neumann architectures. Compute In-Memory (CIM) architectures with emerging memory technologies are promising for minimizing data movement. In line with the CIM direction, several logical and arithmetic operations were demonstrated in the literature for maximizing operations per second per watt using the RRAM crossbar. In this work, we propose a novel way of realizing stateful XOR logic using RRAM crossbar memory. The proposed XOR design is free from the operand switching issue, and since it needs cells within a single column of the 2D crossbar, logic cascading with other logic gates in the same column is straightforward. Secondly, we offer a novel data shifting technique between two consecutive RRAM cell columns/rows of the crossbar. Leveraging the proposed methods, we realize a stateful n-bit parallel adder that takes n+3 computation cycles and 5n RRAM cells within the crossbar. With the proposed n-bit parallel adder design for n>3, we obtain a minimum 1.4X speedup compared to the literature without using an increased number of RRAM cells.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HSB-GDM: a Hybrid Stochastic-Binary Circuit for Gradient Descent with Momentum in the Training of Neural Networks","authors":"Han Li, Heng Shi, Honglan Jiang, Siting Liu","doi":"10.1145/3565478.3572530","DOIUrl":"https://doi.org/10.1145/3565478.3572530","url":null,"abstract":"To enable an energy-efficient training of neural networks, this paper proposes a hybrid stochastic-binary (HSB) computing circuit for implementing the gradient descent with momentum (GDM) algorithm. By accumulating the weight-update values step by step, the proposed design executes the weight optimization of a neural network. At each step, the weight-update value is obtained by a linear combination of its previous value and the current gradient. In this design, it is computed in a hybrid stochastic-binary manner and encoded as a dynamic stochastic sequence consisting of 0, +1 and -1. Then, the weights are updated by accumulating the bits in the dynamic stochastic sequence. With the hybrid stochastic-binary design, this circuit can be readily integrated into a neural network accelerator to support online training with a small footprint. Experimental results show that, with little accuracy loss, the area efficiency of the proposed HSB-GDM is improved by 2.68× and energy efficiency by 4.41× compared to a floating-point design using bfloat16 data format.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-cost stochastic number generator based on MRAM for stochastic computing","authors":"You Wang, Bi Wu, Hao Cai, Weiqiang Liu","doi":"10.1145/3565478.3572545","DOIUrl":"https://doi.org/10.1145/3565478.3572545","url":null,"abstract":"Stochastic computing (SC) can transform the major operations of neural network, i.e. multiply-and-accumulate (MAC), into AND and multiplexer, which drastically reduce the hardware occupation and energy consumption. This paper proposes a novel design of SC for highly energy-efficient computing which combines the features of low power and stochastic switching of magnetic random access memory (MRAM) and the intrinsic fault-tolerance and simple arithmetic operations of SC. A simplified circuit of stochastic number generater (SNG) based on MRAM device is proposed to transform the binary bitstream into stochastic bitstream. Compared with the conventional SNGs, the proposed SNG reduces considerably the design complexity and saves the energy consumption in consequence. Furthermore, the performance is investigated in terms of accuracy and hardware occupation to explore the design space.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115702867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Krayem, Romain Mercier, C. Killian, A. Kritikakou, D. Chillet
{"title":"Data and Fault Aware Routing Algorithm for NoC Based Approximate Computing","authors":"Ibrahim Krayem, Romain Mercier, C. Killian, A. Kritikakou, D. Chillet","doi":"10.1145/3565478.3572327","DOIUrl":"https://doi.org/10.1145/3565478.3572327","url":null,"abstract":"Due to transistor shrinking and core number increasing in System-on-Chip (SoC), fault tolerance has become a critical concern. Given the amount of data communications on such architectures, Network-on-Chips (NoCs) lead a crucial role in terms of performance. Even if fault correction approaches have been developed, they cannot efficiently address several permanent faults on NoC, due to their high hardware costs and correction limitations. In parallel, Approximate Computing domain considers applications that can tolerate errors, hence allowing fault mitigation instead of correction. This latter brings the opportunity of low implementation cost techniques to improve the reliability of SoC. In this work, we propose a routing technique which selects a path between cores according to data type and permanent fault positions. Error tolerant data are able to cross faulty paths by using a bit-shuffling error mitigation technique. Critical data circumvent faulty paths or are duplicated and shuffled in case there is no other correct path available. Results show that our routing technique allows to maintain all the communication paths within the NoC for a large amount of permanent errors. To further evaluate the behavior of the proposed technique, we performed a comprehensive analysis of the technique on the packet latency and saturation injection rate with respect to the number of faults and traffic type.","PeriodicalId":125590,"journal":{"name":"Proceedings of the 17th ACM International Symposium on Nanoscale Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}