N. Jao, A. Ramanathan, S. Srinivasa, Sumitha George, J. Sampson, N. Vijaykrishnan
{"title":"Harnessing Emerging Technology for Compute-in-Memory Support","authors":"N. Jao, A. Ramanathan, S. Srinivasa, Sumitha George, J. Sampson, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2018.00087","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00087","url":null,"abstract":"Compute-in-Memory (CiM) techniques focus on reducing data movement by integrating compute elements within or near the memory primitives. While there have been decades of research on various aspects of such logic and memory integration, the confluence of new technology changes and emerging workloads makes us revisit this design space. This work focuses on new functionality that can be embedded to SRAMs using emerging monolithic 3D integration. Properties of the new technology transform the costs of embedding such new functionality compared to prior efforts. This work also explores how compute functionality can be embedded into cross-point style non-volatile memory systems.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130394521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Shi, Tongda Wu, Keni Qiu, Huazhong Yang, Yongpan Liu
{"title":"Time Stamp Based Scheduling for Energy Harvesting Systems with Hybrid Nonvolatile Hardware Support","authors":"Xin Shi, Tongda Wu, Keni Qiu, Huazhong Yang, Yongpan Liu","doi":"10.1109/ISVLSI.2018.00069","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00069","url":null,"abstract":"Nonvolatile processors have manifested strong vitality in energy harvesting systems due to their endurable features to intermittent power supply. However, repeating configurations of peripherals still occupy too much task execution time, which substantially reduces effectiveness of previous scheduling algorithms. In this paper, we adopt the hybrid nonvolatile hardware platform and then propose a time stamp based scheduling algorithm. The experimental results present that the proposed algorithm matches the platform seamlessly and outperforms state-of-theart algorithms both in effectiveness and efficiency.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133963147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing for Security Within and Between IoT Devices","authors":"M. Borowczak, Rafer Cooley, Shaya Wolf","doi":"10.1109/ISVLSI.2018.00129","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00129","url":null,"abstract":"In this work, we propose utilizing a design-for-security approach to enable system architects and designers with flexibility and control during the initial phases of the design process. General solutions for system security, especially in the communication domain, involves rigid application of prior fundamental constructs, from secure standard cell libraries to common communication encryption/decryption schemes. We propose two methods, one for intra-device communication and another from inter-device communication, for lightweight devices. These methods are tunable and adaptable by designers based on their unique situations and rely on fundamental properties of the underlying systems rather than arbitrarily applied constructs that have been used to enable security in the past.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134279707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Very Large-Scale and Node-Heavy Graph Analytics with Heterogeneous FPGA+CPU Computing Platform","authors":"Yu Zou, Mingjie Lin","doi":"10.1109/ISVLSI.2018.00121","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00121","url":null,"abstract":"We present a highly scalable approach to constructing a reconfigurable computing engine specifically optimized to perform sophisticated kernel computing on graph-structured data. We choose newly emerged graph convolutional networks (GCNs) as our key benchmark and develop a novel node-heavy edge-centric computing framework for very large-scale graph analytics. Unlike most existing studies, our design and implementation can handle extremely large graph size that well exceeds the on-chip memory capacity of any FPGA+CPU heterogeneous platform, thus can only be stored in hard drive. The most novel aspect of our approach is to enable a completely streaming mode of large vertex and edge data and perform a write-back message updating policy, therefore completely removing any redundant data accesses to IO-expensive hard drive. Additionally, our subgraph sorting scheme can effectively eliminate the performance bottleneck caused by preprocessing in the state-of-art computing framework X-Stream. To validate our approach, we have implemented our proposed method with a KC705 Xilinx FPGA board and tested it with multiple real-world large-scale data sets. For the largest data set with 210,010 vertices and 1,349,400 edges, our platform achieves 1.87s in total latency, which is approximately 400 times faster than the baseline platform with the state-of-the-art approach.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Yang, Bingzhe Li, D. Lilja, Bo Yuan, Weikang Qian
{"title":"Towards Theoretical Cost Limit of Stochastic Number Generators for Stochastic Computing","authors":"Meng Yang, Bingzhe Li, D. Lilja, Bo Yuan, Weikang Qian","doi":"10.1109/ISVLSI.2018.00037","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00037","url":null,"abstract":"Stochastic number generator (SNG) is one important component of stochastic computing (SC). An SNG usually consists of a random number source (RNS) and a probability conversion circuit (PCC). The SNGs occupy a large portion of the total area and power of a stochastic circuit. Thus, it is critical to lower the area and power of the SNGs. The existing methods only focused on simplifying the RNSs inside the SNGs, such as sharing the RNSs and using emerging devices. However, how to reduce the area and power of PCCs is never studied. In this work, we explore this problem and propose a solution that can effectively reduce the area and power of PCCs. We also study the theoretical limit on the cost of SNG and find that our proposed design approaches the limit. The experimental results show that our design can gain up to 2x improvement in power-delay product over the traditional SNGs.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122311526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-target Many-Reactant Sample Preparation for Reactant Minimization on Microfluidic Biochips","authors":"Yung-Chun Lei, Tien-Kuo Lin, Juinn-Dar Huang","doi":"10.1109/ISVLSI.2018.00124","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00124","url":null,"abstract":"Sample preparation is one of essential steps in biochemical applications. It produces solutions with target concentrations through mixing various reactants in a specific way. In this paper, we propose a reactant cost minimization technique, M2SPA, for multi-target many-reactant sample preparation on microfluidic biochips through maximally sharing identical intermediate solutions among different targets. M2SPA first represents target concentrations as a recipe cube, searches all feasible candidates for intermediate solution sharing among targets, and then selects the one with the best cost saving for action. Experimental results show that the proposed algorithm can reduce up to 15.7% of reactant cost as compared to a state-of-the-art method.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115279573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RRAM Based Buffer Design for Energy Efficient CNN Accelerator","authors":"Kaiyuan Guo, Jincheng Yu, Xuefei Ning, Yiming Hu, Yu Wang, Huazhong Yang","doi":"10.1109/ISVLSI.2018.00085","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00085","url":null,"abstract":"Convolutional Neural Network (CNN) has become the state-of-the-art algorithm for many computer vision tasks. But its high computation complexity and high memory complexity makes it hard to be deployed on traditional platforms like CPUs. Memory energy can take up a large part of the system energy, which limits the energy efficiency of CNN processing. The emerging metal-oxide resistive switching random-access memory (RRAM) has been widely studied because of its good properties like high storage density and the compatibility with CMOS. In this paper, a system level energy analysis of using RRAM as on-chip weight buffer is carried out for a typical CNN accelerator. Hardware and scheduling optimizations are proposed to fully utilize the large RAM and avoid high read/write energy overhead. Experimental results show that RRAM based designs save 12-18% system energy with 15-75% smaller on-chip RAM area compared with SRAM designs.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116812510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise Duty Cycle Variation Detection and Self-Calibration System for High-Speed Data Links","authors":"Karen Khachikyan, Abraham Balabanyan, H. Gumroyan","doi":"10.1109/ISVLSI.2018.00044","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00044","url":null,"abstract":"A design and simulation methodology that detects and compensates duty cycle deviations is presented. The proposed method provides robust mechanism to reduce transmission line adverse effects and improves received signal quality. A mixed signal approach, where an analog circuit is used to track signal timing distortion values, and a digital circuit controls the analog calibration mechanism, is used. The self-calibration mechanism doesn't interrupt the system operation and is being realized in parallel with normal operation. The system is designed in 28nm CMOS process and simulated using Synopsys mixed mode simulation tools.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114478568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhezhi He, Shaahin Angizi, A. S. Rakin, Deliang Fan
{"title":"BD-NET: A Multiplication-Less DNN with Binarized Depthwise Separable Convolution","authors":"Zhezhi He, Shaahin Angizi, A. S. Rakin, Deliang Fan","doi":"10.1109/ISVLSI.2018.00033","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00033","url":null,"abstract":"In this work, we propose a multiplication-less deep convolution neural network, called BD-NET. As far as we know, BD-NET is the first to use binarized depthwise separable convolution block as the drop-in replacement of conventional spatial-convolution in deep convolution neural network (CNN). In BD-NET, the computation-expensive convolution operations (i.e. Multiplication and Accumulation) are converted into hardware-friendly Addition/Subtraction operations. In this work, we first investigate and analyze the performance of BD-NET in terms of accuracy, parameter size and computation cost, w.r.t various network configurations. Then, the experiment results show that our proposed BD-NET with binarized depthwise separable convolution can achieve even higher inference accuracy to its baseline CNN counterpart with full-precision conventional convolution layer on the CIFAR-10 dataset. From the perspective of hardware implementation, the convolution layer of BD-NET achieves up to 97.2%, 88.9%, and 99.4% reduction in terms of computation energy, memory usage, and chip area respectively.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114712702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Based Coverage Analysis of AMS Circuits","authors":"Antara Ain, Akshay Mambakam, P. Dasgupta","doi":"10.1109/ISVLSI.2018.00083","DOIUrl":"https://doi.org/10.1109/ISVLSI.2018.00083","url":null,"abstract":"Coverage analysis for Analog and Mixed-Signal (AMS) behaviors involves exploring continuous state spaces defined by real valued artifacts. For example, coverage of analog features such as settling time, peak overshoot, etc., entails not only finding whether we have seen such behaviors, but also whether we have covered the range of values of such features (including the extremal values). Thus it is simplistic to lift the notion of assertion based functional coverage from the digital domain into the AMS domain - rather we need to address coverage in the value and time domains. In this paper we propose a methodology for feature based coverage analysis of AMS circuits.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130717643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}