{"title":"Mixed-Signal Circuits and Architectures for Energy-Efficient In-Memory and In-Sensor Computation of Artificial Neural Networks","authors":"Bongjin Kim","doi":"10.1109/SOCC46988.2019.1570571922","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570571922","url":null,"abstract":"Von Neumann architecture is recently facing a critical challenge with the high demands of energy-efficient computing hardware for a variety of machine learning tasks such as image classification. In particular, battery-operated mobile devices with limited power budget cannot process artificial neural networks (ANNs) with relatively low complexity by using traditional digital circuits and architectures. The key challenge with the traditional Von Neumann architecture is its energy-inefficient data access between memory and processor. Recently, in-memory computing architecture has gained significant attention as an alternative, especially for running mobile artificial intelligence applications. The memory access energy has been drastically reduced by using local in-memory processing elements which directly use the data stored in local memory. To further improve the efficiency, the processor based on mixed-signal circuits instead of conventional digital circuits have recently been actively researched. However, mixed-signal circuits have several critical drawbacks, including nonlinearity, PVT variation, and the overhead of ADC/DAC for interfacing with the external digital domain. In this work, we first review the recent mixed-signal circuits and architectures for in-memory computing using different embedded memories. In addition, we introduce the concept of in-sensor computation for integrating partial computing units in an image-sensor array using low-power mixed-signal circuit techniques.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131755955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renyuan Zhang, Yan Chen, Takashi Nakada, Y. Nakashima
{"title":"DiaNet: An Efficient Multi-Grained Re-configurable Neural Network in Silicon","authors":"Renyuan Zhang, Yan Chen, Takashi Nakada, Y. Nakashima","doi":"10.1109/SOCC46988.2019.1570548015","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570548015","url":null,"abstract":"A hardware friendly topology of neural network is proposed in this work. Instead of full connections between neighbor layers, the bisection-propagation from “parents” to “twins” is performed to retrieve the behaviors of conventional neural network. In this manner, the conventional dense-butshallow topology is organized in sparse-but-deep fashion. A large scale of synapses and neurons array is symmetrically designed with VLSI circuits on-chip. According to specific application demands, the entire array is cut into arbitrary diamond-shape pieces without redundant synapses. Each diamond-cut behaves as an independent neural network for corresponding tasks in fully parallel. Namely, the proposed network-on-chip is multigrained re-configurable by configuring synapse and neuron behavior (fine-grained), reshaping the diamond-cut (mediumgrained), and organizing multiple DiaNets (coarse-grained). To carry out the synapse and neuron computations, a set of analog calculation circuits is designed with 80 MOS transistors for one processing unit including two synapses and one neuron in dual activation-modes of sigmoid and rectified linear function. For proof-of-concept, several case studies of regression tasks with one-, two, and nine-variables are implemented by the proposed network. From the circuit simulation results, all the demonstrated regressions are executed by the compact hardware resource of 720 MOS transistors with the maximum power consumption of 19:4%W. The regression error is about 4:2%, 4:3%, and 1:2% for one-, two-, and nine-variable examples, respectively.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134632964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 100-mVpp Input Range 10-kHz BW VCO-based CT-DSM Neuro-Recording IC in 40-nm CMOS","authors":"W. Zhou, W. Goh, Yi Chen, Tantan Zhang, Yuan Gao","doi":"10.1109/SOCC46988.2019.1570553458","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570553458","url":null,"abstract":"This paper presents a time-domain continuous-time sigma delta modulator (CT-DSM) based neuro-recording interface circuit. This circuit consists of a current-reuse fully differential OTA, a voltage-controlled oscillator (VCO), a counter-based quantizer and a capacitive DAC feedback circuit with Data Weighted Averaging (DWA) logic. A current-reuse Gm cell is adopted to suppress the input-referred noise with high energy efficiency. The VCO converts the input signal amplitude into phase for integration as well as quantization by the counter-based quantizer. The DAC feedback circuit ensures a linear operation of Gm-VCO within the input range. The prototype circuit is designed and implemented in a commercial 40-nm CMOS process. the proposed circuit consumes 19.5 $mu$ W under 1.2-V supply voltage. With the maximum tolerable input swing of 100-mVpp, the proposed circuit achieves an SNDR of 59 dB over a bandwidth of 10 kHz. The proposed design is suitable for application such as the neuro-recording circuit in the closed-loop neural stimulation system.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127664471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yen-Hao Chen, Po-Chen Huang, Fu-Wei Chen, A. Wu, TingTing Hwang
{"title":"Crosstalk-aware TSV-buffer Insertion in 3D IC","authors":"Yen-Hao Chen, Po-Chen Huang, Fu-Wei Chen, A. Wu, TingTing Hwang","doi":"10.1109/SOCC46988.2019.1570539111","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570539111","url":null,"abstract":"3D integration is one of the promising technologies to alleviate interconnection delay. Implementing 3D IC is to integrate 2D ICs with Through-Silicon Vias (TSVs). For yield consideration, TSVs are bundled together as a TSV block [1]. Regrettably, this placement will result in crosstalk coupling noises in TSV block, which may cause significant timing degradation. Traditionally, buffer sizing is one of the effective methods to solve the problem. However, we have observed that increasing the TSV-buffer size of aggressor TSV will cause serious timing degradation to the victim TSV in 3D than wires in 2D cases. In this paper, we develop a delay model of a victim TSV surrounded by aggressor TSVs with different driving TSVbuffer sizes. Based on the TSV delay model, we propose (1) an ILP (Integer Linear Programming) method, which is able to find the nearoptimal solution, and (2) an efficient crosstalk-aware heuristic method for practical use. Our experimental results show that the proposed heuristic method only uses 2.56% (3.05%) more TSV-buffers compared to the optimal ILP solution and achieves on average 32.88% (42.40%) and 18.21% (23.06%) area reduction of area-overheads compared to the conventional greedy [2] and separator sets [3] methods in our 2-tier (4-tier) benchmark circuits.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127285114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Singh, B. Bruin, J. Huisken, Hailong Jiao, J. P. D. Gyvez
{"title":"Voltage Stacked Design of a Microcontroller for Near/Sub-threshold Operation","authors":"K. Singh, B. Bruin, J. Huisken, Hailong Jiao, J. P. D. Gyvez","doi":"10.1109/SOCC46988.2019.1570558508","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570558508","url":null,"abstract":"Integrated systems operating in the near/sub-threshold region offer low power and energy consumption. Such systems, however, typically suffer from low efficiency in power delivery, thereby leading to ineffective power savings. In this paper, a voltage stacking system with a RISC-V microcontroller Pulpino at the bottom voltage stack and memory arrays on the top stack is proposed. The memory arrays operate at 0.7 V supply voltage, while the microcontroller operate at 0.4 V supply voltage (near/sub-threshold region) by using the leakage currents from the memory arrays. Instead of using complex voltage regulators, a simple current sink voltage controller with low area and energy overheads is used to stabilize the intermediate voltage rail between the top and bottom power domains. To the best of our knowledge, this is the first work proposing voltage stacking for near/sub-threshold systems. Implemented in a 28-nm FDSOI CMOS technology, the proposed voltage stacking system reduces the power consumption by up to 43% as compared to the conventional implementation in a flat voltage domain.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131426099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure Speculative Core","authors":"A. Mendelson","doi":"10.1109/SOCC46988.2019.1570564192","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570564192","url":null,"abstract":"The recent attacks on speculative cores have led people to believe that high-performance cores and security demands contradict each other. This work demonstrates that if the core is designed while considering security demands as a first-class citizen, it can support high-performance computing via out-of-order architectures and remain secure.We propose a new methodology for secure and speculative core (SCC) architecture. The design uses an enhanced out-of-order core architecture to achieve the required high-performance and implements a unique secure-wrapper that guarantees the required security properties. The methodology provides a set of mechanisms to implement the SSC architecture. Our experiments indicate that adding these new features to out-of-order cores such as the Intel Coffee Lake can immunize the system against side-channel attacks, with only minor performance degradation.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116331339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, Jiayan Gan, Yuxiang Xie, Yin Wang, Zhuoling Xiao, Jun Zhou
{"title":"A Power-Efficient Programmable DCNN Processor for Intelligent Sensing","authors":"Bo Wang, Jiayan Gan, Yuxiang Xie, Yin Wang, Zhuoling Xiao, Jun Zhou","doi":"10.1109/SOCC46988.2019.1570553982","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570553982","url":null,"abstract":"Existing deep convolutional neural network (DCNN) processors are mainly designed for high-end applications such as autonomous vehicle, data center and smart phone where the design focus is the performance, while for intelligent sensing devices power efficiency are more important. In addition, programmability is important for DCNN processors to support different DCNN. We have proposed a power-efficient programmable DCNN processor dedicated for intelligent sensing devices and demonstrated it using FPGA. Several techniques have been proposed to improve the power efficiency. Implemented on a Xilinx VC707 FPGA board, It achieves a power efficiency of 31 Gops/W with peak performance of 487 Gops, which is better than several state-of-the-art DCNN processors.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114667274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Oszwald, Philipp Obergfell, Matthias Traub, J. Becker
{"title":"Reliable Fail-Operational Automotive E/E-Architectures by Dynamic Redundancy and Reconfiguration","authors":"Florian Oszwald, Philipp Obergfell, Matthias Traub, J. Becker","doi":"10.1109/SOCC46988.2019.1570547977","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570547977","url":null,"abstract":"For future autonomous driving cars, fail-operational systems are necessary. Dynamical reconfiguration is one possible approach to fulfill this requirement for fail-operational behavior. For automotive real-time embedded systems in a fail-operational context, dynamical reconfiguration has not yet been investigated. At first, this paper describes a process to realize this approach in the automotive industry and shows its advantages. Second, we adopt an existing fail-operational architecture to the requirements of the steering function and extend the existing state handover with the CAN communication. For this, we modeled a hardware extension to prevent the system from a loss of state and integrated it into this architecture. Third, we integrate the adapted architecture into a service-oriented architecture, and specify necessary interfaces and protocols. By using a service-oriented approach, we enhance the principle of dynamic redundancy from the component level to the system level. As an evaluation, we provide an implementation on a test bench which reveals indications for the use of our concept in future autonomous driving cars.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116895766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ML-based Reinforcement Learning Approach for Power Management in SoCs","authors":"D. Akselrod","doi":"10.1109/SOCC46988.2019.1570548498","DOIUrl":"https://doi.org/10.1109/SOCC46988.2019.1570548498","url":null,"abstract":"This paper presents a machine learning-based reinforcement learning approach, mapping Finite State Machines, traditionally used for power management control in SoCs, to Markov Decision Process (MDP)-based agents for controlling power management features of Integrated Circuits with application to complex multiprocessor-based SoCs such as CPUs, APUs and GPUs. We present the problem of decision-based control of a number of power management features in ICs consisting of numerous heterogeneous IPs. An infinite-horizon fully observable MDPs are utilized to obtain a policy of actions maximizing the expectation of the formulated Power Management utility function. The approach balances the demand for desired performance while providing an optimal power saving as opposed to commonly used FSM-based power management techniques. MDP framework was employed for power management decision-making under conditions of uncertainly for reinforcement learning. We describe in detail converting power management FSMs into infinite-horizon fully observable MDPs. The approach optimizes itself using reinforcement learning based on specified reward structure and previous performance, yielding an optimal and dynamically adjusted power management mechanism in respect to the formulated model.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127491239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}