{"title":"Quantum Most-Significant Digit-First Addition","authors":"He Li, Hongxiang Fan, Jiawei Liang","doi":"10.1109/IGSC54211.2021.9651595","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651595","url":null,"abstract":"In recent years, quantum computers have attracted extensive research interests due to their potential capability of solving problems which are not easily solvable using classical computers. In parallel to the constant research aiming at the physical implementation of quantum processors, there is another branch of research developing quantum algorithms for real-life applications, many of which need to perform arithmetic operations. As one of the most important operations, quantum addition has been adopted in Shor's algorithm, quantum linear algebra algorithms and various quantum machine learning applications. Since precision is always a non-trivial issue to determine during the computation, most-significant digit-first quantum addition can be a fundamental operation for variable precision computing. Therefore, this paper proposes the first quantum adder circuit that is able to compute from the most-significant digits, which demonstrates the advantages over the state-of-the-art quantum adders requiring carry propagation to produce results from least-significant digits. We first present a review of quantum addition circuits design, and then propose a novel method to implement quantum most-significant digit-first adders. Scalability and quantitative comparisons for different quantum full adder, quantum carry-ripple adder and quantum most-significant digit-first adder circuits have been investigated, where all circuits are implemented on IBM Qiskit SDK.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124472675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Sampling and Edge Detection Approach for Encoding Static Images for Spiking Neural Networks","authors":"Peyton S. Chandarana, Jun Ou, Ramtin Zand","doi":"10.1109/IGSC54211.2021.9651610","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651610","url":null,"abstract":"Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and power constraints by taking inspiration from biological neuronal communication processes. Before data such as images can be input into an SNN, however, they must be first encoded into spike trains. Herein, we propose a method for encoding static images into temporal spike trains using edge detection and an adaptive signal sampling method for use in SNNs. The edge detection process consists of first performing Canny edge detection on the 2D static images and then converting the edge detected images into two X and Y signals using an image-to-signal conversion method. The adaptive signaling approach consists of sampling the signals such that the signals maintain enough detail and are sensitive to abrupt changes in the signal. Temporal encoding mechanisms such as threshold-based representation (TBR) and step-forward (SF) are then able to be used to convert the sampled signals into spike trains. We use various error and indicator metrics to optimize and evaluate the efficiency and precision of the proposed image encoding approach. Comparison results between the original and reconstructed signals from spike trains generated using edge-detection and adaptive temporal encoding mechanism exhibit $18times$ and $7times$ reduction in average root mean square error (RMSE) compared to the conventional SF and TBR encoding, respectively, while used for encoding MNIST dataset.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121243948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catherine D. Schuman, Steven R. Young, Bryan P. Maldonado, B. Kaul
{"title":"Real-Time Evolution and Deployment of Neuromorphic Computing at The Edge","authors":"Catherine D. Schuman, Steven R. Young, Bryan P. Maldonado, B. Kaul","doi":"10.1109/IGSC54211.2021.9651607","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651607","url":null,"abstract":"Extremely low power neuromorphic systems are well-suited for deployment to the edge for many applications. In many use cases of neuromorphic computing for control, a spiking neural network is trained off-line using a simulation and then deployed to a neuromorphic system at the edge, where it will operate without ongoing training or learning. However, it may be desirable to continue training or learning at the edge to refine or adapt to the real-world system. In this work, we propose an approach for performing real-time evolutionary optimization for spiking neural networks for neuromorphic deployment at the edge. In particular, we propose a combination of simulation and real-world evaluations, along with feedback from the real-world environment, to train spiking neural networks for continuous deployment to the edge. We show that the real-time evolution at the edge approach achieves comparable performance to an evolution approach that requires constant evaluation in the realworld environment.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124782673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani
{"title":"Inf4Edge: Automatic Resource-aware Generation of Energy-efficient CNN Inference Accelerator for Edge Embedded FPGAs","authors":"Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani","doi":"10.1109/IGSC54211.2021.9651650","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651650","url":null,"abstract":"Convolutional Neural Networks (CNN) have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. CNN inference is very computation-intensive which makes it difficult to be integrated into resource-constrained embedded devices such as smart phones, smart glasses, and robots. Along side inference latency, energy-efficiency is also of great importance when it comes to embedded devices with limited computational, storage, and energy resources. Embedded FPGAs, as a fast and energy-efficient solution, are one of widely used platforms for accelerating CNN inference. However, the difficulty of programming and their limited hardware resources have made them a less attractive option to the users. In this paper, we propose Inf4Edge, an automated framework for designing CNN inference accelerator on small embedded FPGAs. The proposed framework seamlessly generates a CNNs inference accelerator that fits the target FPGA using different resource-aware optimization techniques. We eliminate the overhead of transferring the data to/from FPGA back and forth which introduces latency and energy consumption. To avoid the data transfer overhead, we keep all of the data on the FPGA on-chip memory which makes the generated inference accelerator faster and more energy-efficient. Given a high-level description of the CNN and a data set, the framework builds and trains the model, and generates an optimized CNN inference accelerator for the target FPGA. As a case study, we use 16-bit fixed-point data in the generated CNN inference accelerator on a small FPGA and compare it to the same software model running on the FPGA's ARM processor. Using 16-bit fixed-point data type results in ~ 2% accuracy loss in the CNN inference accelerator. In return, we get up to $15.86times$ speedup performing inference on the FPGA.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123511569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approaching the Area of Neuromorphic Computing Circuit and System Design","authors":"Honghao Zheng, Juliet Anderson, Yang Yi","doi":"10.1109/IGSC54211.2021.9651627","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651627","url":null,"abstract":"The traditional von Neumann architecture has met limitations in both computation and energy efficiency. Researchers' attention has been diverted to neuromorphic computing with the progression of neuroscience. With the inspiration of mammal neural systems, neuromorphic chips are designed and fabricated. This paper will introduce the basic concept and elements of neuromorphic computing circuit design, such as spiking neurons and encoders. Spiking encoders convert analog signals to spikes and lead to high power efficiency while maintaining low hardware implementation costs. Spiking neural networks that utilize the delay-feedback property have been designed and fabricated. One of them is the delay-feedback reservoir (DFR) network that is more computational efficient than the conventional recurrent neural network (RNN). The others are hybrid neural networks (HNN) that combine DFR with other neural networks like multilayer perceptron (MLP) and convolutional neural network (CNN). Finally, the measurement performance for different applications of these neural networks (NNs) will also be demonstrated.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132088963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design Technology Co-Optimization for Neuromorphic Computing","authors":"Ankita Paul, Shihao Song, Anup Das","doi":"10.1109/IGSC54211.2021.9651556","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651556","url":null,"abstract":"We present a design-technology tradeoff analysis in implementing machine-learning inference on the processing cores of a Non-Volatile Memory (NVM)-based many-core neuromorphic hardware. Through detailed circuit-level simulations for scaled process technology nodes, we show the negative impact of design scaling on read endurance of NVMs, which directly impacts their inference lifetime. At a finer granularity, the inference lifetime of a core depends on 1) the resistance state of synaptic weights programmed on the core (design) and 2) the voltage variation inside the core that is introduced by the parasitic components on current paths (technology). We show that such design and technology characteristics can be incorporated in a design flow to significantly improve the inference lifetime.","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128479804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan G'omez-Luna, I. E. Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, O. Mutlu
{"title":"Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware","authors":"Juan G'omez-Luna, I. E. Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, O. Mutlu","doi":"10.1109/IGSC54211.2021.9651614","DOIUrl":"https://doi.org/10.1109/IGSC54211.2021.9651614","url":null,"abstract":"Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new technologies that integrate memory with a logic layer, where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper presents key takeaways from the first comprehensive analysis [1] of the first publicly-available real-world PIM architecture. First, we introduce our experimental characterization of the UPMEM PIM architecture using microbenchmarks, and present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. Second, we provide four key takeaways about the UPMEM PIM architecture, which stem from our study of the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and their performance and energy consumption comparison to their state-of-the-art CPU and GPU counterparts. More insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems are available in [1].","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133127135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}