{"title":"Overcoming Challenges for Achieving High in-situ Training Accuracy with Emerging Memories","authors":"Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu","doi":"10.23919/DATE48585.2020.9116215","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116215","url":null,"abstract":"Embedded artificial intelligence (AI) prefers the adaptive learning capability when deployed in the field, thus in- situ training on-chip is required. Emerging non-volatile memories (eNVMs) are of great interests serving as analog synapses in deep neural network (DNN) on-chip acceleration due to its multilevel programmability. However, the asymmetry/nonlinearity in the conductance tuning remains a grand challenge for achieving high in-situ training accuracy. In addition, analog-to-digital converter (ADC) at the edge of the memory array introduces an additional challenge - quantization error for in-memory computing. In this work, we gain new insights and overcome these challenges through an algorithm-hardware co-optimization. We incorporate these hardware non-ideal effects into the DNN propagation and weight update steps. We evaluate on a VGG-like network for CIFAR-10 dataset, and we show that the asymmetry of the conductance tuning is no longer a limiting factor of in-situ training accuracy if exploiting adaptive \"momentum\" in the weight update rule. Even considering ADC quantization error, in-situ training accuracy could approach software baseline. Our results show much relaxed requirements that enable a variety of eNVMs for DNN acceleration on the embedded AI platforms.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130008602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina
{"title":"A Framework for Adding Low-Overhead, Fine-Grained Power Domains to CGRAs","authors":"Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina","doi":"10.23919/DATE48585.2020.9116477","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116477","url":null,"abstract":"To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32 × 16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129309125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache Persistence-Aware Memory Bus Contention Analysis for Multicore Systems","authors":"Syed Aftab Rashid, Geoffrey Nelissen, E. Tovar","doi":"10.23919/DATE48585.2020.9116265","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116265","url":null,"abstract":"Memory bus contention strongly relates to the number of main memory requests generated by tasks running on different cores of a multicore platform, which, in turn, depends on the content of the cache memories during the execution of those tasks. Recent works have shown that due to cache persistence the memory access demand of multiple jobs of a task may not always be equal to its worst-case memory access demand in isolation. Analysis of the variable memory access demand of tasks due to cache persistence leads to significantly tighter worst-case response time (WCRT) of tasks.In this work, we show how the notion of cache persistence can be extended from single-core to multicore systems. In particular, we focus on analyzing the impact of cache persistence on the memory bus contention suffered by tasks executing on a multi-core platform considering both work conserving and non-work conserving bus arbitration policies. Experimental evaluation shows that cache persistence-aware analyses of bus arbitration policies increase the number of task sets deemed schedulable by up to 70 percentage points in comparison to their respective counterparts that do not account for cache persistence.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128061894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Bijlsma, Andrii Buriachevskyi, A. Frigerio, Yuting Fu, K. Goossens, Ali Osman Örs, Pieter J. van der Perk, A. Terechko, B. Vermeulen
{"title":"A Distributed Safety Mechanism using Middleware and Hypervisors for Autonomous Vehicles","authors":"T. Bijlsma, Andrii Buriachevskyi, A. Frigerio, Yuting Fu, K. Goossens, Ali Osman Örs, Pieter J. van der Perk, A. Terechko, B. Vermeulen","doi":"10.23919/date48585.2020.9116268","DOIUrl":"https://doi.org/10.23919/date48585.2020.9116268","url":null,"abstract":"Autonomous vehicles use cyber-physical systems to provide comfort and safety to passengers. Design of safety mechanisms for such systems is hindered by the growing quantity and complexity of SoCs (System-on-a-Chip) and software stacks required for autonomous operation. Our study tackles two challenges: (1) fault handling in an autonomous driving system distributed across multiple processing cores and SoCs, and (2) isolation of multiple software modules consolidated in one SoC. To address the first challenge, we extend the state-of-the-art E-Gas layered monitoring concept. Similar to E-Gas, our safety mechanism has function, controller and vehicle layers. We propose to distribute these safety layers on processors with different ASILs (Automotive Safety Integrity Level). Besides, we implement seif-test, fault injection and challenge-response protocols to detect faults at runtime in the safety mechanism itself. To facilitate distributed operation, our mechanism is built on top of the DDS (Data Distribution Service) software middleware for safety-critical embedded applications, as well as DDS-XRCE (eXtremely Resource Constrained Environment) for resource- constrained processor cores of the highest ASIL. To address the second challenge, our safety mechanism employs hardware- assisted hypervisors to isolate software modules and implement fail-silent behavior of faulty software stacks. We validate our safety mechanism on the NXP BiueBox hardware platform using the LG SVL simulator, Baidu Apollo software framework for autonomous driving, and Xen hypervisor. Our fault injection experiments demonstrate that the distributed safety mechanism successfully detects faults in an autonomous system and safely stops the vehicle when necessary.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124572041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Kolan, Hillel Mendelson, V. Sokhin, K. Reick, Elena Tsanko, Greg Wetli
{"title":"Post-Silicon Validation of the IBM POWER9 Processor","authors":"Tom Kolan, Hillel Mendelson, V. Sokhin, K. Reick, Elena Tsanko, Greg Wetli","doi":"10.23919/DATE48585.2020.9116491","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116491","url":null,"abstract":"Due to the complexity of designs, post-silicon validation remains a major challenge with few systematic solutions. We provide an overview of the state-of-the-art post silicon validation process used by IBM to verify its latest IBM POWER9 processor. During the POWER9 post-silicon validation, we detected and handled 30% more logic bugs in 80% of the time, as compared to the previous IBM POWER8 bring-up. This improvement is the result of lessons learned from previous designs, leading to numerous innovations. We provide bug analysis data and compare it to POWER8 results. We demonstrate our methodology by describing several bugs from fail detection to root cause.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128875225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonios Pavlidis, M. Louërat, E. Faehn, Anand Kumar, H. Stratigopoulos
{"title":"Symmetry-based A/M-S BIST (SymBIST): Demonstration on a SAR ADC IP","authors":"Antonios Pavlidis, M. Louërat, E. Faehn, Anand Kumar, H. Stratigopoulos","doi":"10.23919/DATE48585.2020.9116189","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116189","url":null,"abstract":"In this paper, we propose a defect-oriented Built-In Self-Test (BIST) paradigm for analog and mixed-signal (A/MS) Integrated Circuits (ICs), called symmetry-based BIST (Sym-BIST). SymBIST exploits inherent symmetries into the design to generate invariances that should hold true only in defect-free operation. Violation of any of these invariances points to defect detection. We demonstrate SymBIST on a 65nm 10-bit Successive Approximation Register (SAR) Analog-to-Digital Converter (ADC) IP by ST Microelectronics.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"10 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120892453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XGBIR: An XGBoost-based IR Drop Predictor for Power Delivery Network","authors":"C. Pao, An-Yu Su, Yu-Min Lee","doi":"10.23919/DATE48585.2020.9116327","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116327","url":null,"abstract":"This work utilizes the XGBoost to build a machine-learning-based IR drop predictor, XGBIR, for the power grid. To capture the behavior of power grid, we extract its several features and employ its locality property to save the extraction time. XGBIR can be effectively applied to large designs and the average error of predicted IR drops is less than 6 mV.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126660739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Flouzat, E. Piriou, Mickaël Guibert, Bojan Jovanovic, M. Oussayran
{"title":"EVPS: An Automotive Video Acquisition and Processing Platform","authors":"C. Flouzat, E. Piriou, Mickaël Guibert, Bojan Jovanovic, M. Oussayran","doi":"10.23919/DATE48585.2020.9116332","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116332","url":null,"abstract":"This paper describes a versatile and flexible video acquisition and processing platform for automotive. It is designed to meet aggressive requirements in terms of bandwidth and latency when implementing ADAS functions. Based on a Xilinx Ultrascale+ FPGA device, a vision processing pipeline mixing software and hardware tasks is implemented on this platform. This setup is able to collect four automotive camera streams (MIPI CSI2) and process them in the loop before transmitting a more intelligible pre-processed/enhanced data.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126359426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdallah M. Felfel, K. Datta, Arko Dutt, H. Veluri, Ahmed Zaky, A. Thean, M. Aly
{"title":"Quantifying the Benefits of Monolithic 3D Computing Systems Enabled by TFT and RRAM","authors":"Abdallah M. Felfel, K. Datta, Arko Dutt, H. Veluri, Ahmed Zaky, A. Thean, M. Aly","doi":"10.23919/DATE48585.2020.9116410","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116410","url":null,"abstract":"Current data-centric workloads, such as deep learning, expose the memory-access inefficiencies of current computing systems. Monolithic 3D integration can overcome this limitation by leveraging fine-grained and dense vertical connectivity to enable massively-concurrent accesses between compute and memory units. Thin-Film Transistors (TFTs) and Resistive RAM (RRAM) naturally enable monolithic 3D integration as they are fabricated in low temperature (a crucial requirement). In this paper, we explore ZnO-based TFTs and HfO2-based RRAM to build a 1TFT-1R memory subsystem in the upper tiers. The TFT-based memory subsystem is stacked on top of a Si-FET bottom tier that can include compute units and SRAM. System-level simulations for various deep learning workloads show that our TFT-based monolithic 3D system achieves up to 11.4× system-level energy-delay product benefits compared to 2D baseline with off-chip DRAM—5.8× benefits over interposer-based 2.5D integration and 1.25× over 3D stacking of RRAM on silicon using through-silicon vias. These gains are achieved despite the low density of TFT-based RRAM and the higher energy consumption versus 3D stacking with RRAM, due to inherent TFT limitations.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127921382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihao Yuan, Geoffrey Vaartstra, Prachi Shukla, Zhengmao Lu, E. Wang, S. Reda, A. Coskun
{"title":"A Learning-Based Thermal Simulation Framework for Emerging Two-Phase Cooling Technologies","authors":"Zihao Yuan, Geoffrey Vaartstra, Prachi Shukla, Zhengmao Lu, E. Wang, S. Reda, A. Coskun","doi":"10.23919/DATE48585.2020.9116480","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116480","url":null,"abstract":"Future high-performance chips will require new cooling technologies that can extract heat efficiently. Two-phase cooling is a promising processor cooling solution owing to its high heat transfer rate and potential benefits in cooling power. Two-phase cooling mechanisms, including microchannel-based two-phase cooling or two-phase vapor chambers (VCs), are typically modeled by computing the temperature-dependent heat transfer coefficient (HTC) of the evaporator or coolant using an iterative simulation framework. Precomputed HTC correlations are specific to a given cooling system design and cannot be applied to even the same cooling technology with different cooling parameters (such as different geometries). Another challenge is that HTC correlations are typically calculated with computational fluid dynamics (CFD) tools, which induce long design and simulation times. This paper introduces a learning-based temperature-dependent HTC simulation framework that is used to model a two-phase cooling solution with a wide range of cooling design parameters. In particular, the proposed framework includes a compact thermal model (CTM) of two-phase VCs with hybrid wick evaporators (of nanoporous membrane and microchannels). We build a new simulation tool to integrate the proposed simulation framework and CTM. We validate the proposed simulation framework as well as the new CTM through comparisons against a CFD model. Our simulation framework and CTM achieve a speedup of 21 × with an average error of 0.98° C (and a maximum error of 2.59° C). We design an optimization flow for hybrid wicks to select the most beneficial hybrid wick geometries. Our flow is capable of finding a geometry- coolant combination that results in a lower (or similar) maximum chip temperature compared to that of the best coolant-geometry pair selected by grid search, while providing a speedup of 9.4 x.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}