{"title":"VST: A virtual stress testing framework for discovering bugs in SSD flash-translation layers","authors":"Ren-Shuo Liu, Yun-Sheng Chang, Chih-Wen Hung","doi":"10.1109/ICCAD.2017.8203790","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203790","url":null,"abstract":"Flash translation layers (FTLs) are the core embedded software (also known as firmware) of NAND flash-based solid-state drives (SSDs). The relentless pursuit of high-performance SSDs renders FTLs increasingly complex and intricate. Therefore, testing and validating FTLs are crucial and challenging tasks. Directly testing and validating FTLs on SSD hardware are common practices though, they are time-consuming and cumbersome because 1) the testing speed is limited by the hardware speed of SSDs and 2) just reproducing bugs can be challenging, let alone locating and root causing the bugs. This work presents virtual stress testing (VST), a simulation framework to enable executing SSD FTLs on PCs or servers against virtual SRAM, DRAM, and flash emulated by host-side main memory. FTL function calls, such as moving data from flash to DRAM, are served by the VST framework. Therefore, VST can test FTLs without SSD hardware requirements nor SSD speed limitations, and root causing bugs becomes manageable tasks. We apply VST to representative SSD design, OpenSSD, which is actively utilized and maintained by SSD and FTL communities. Experimental results show that VST can test FTLs at a speed up to 375 GB/s, which is several hundred times faster than directly testing FTLs on SSD hardware. Moreover, we successfully discover seven new FTL bugs in the OpenSSD design using VST, which is a solid evidence of VST's bug-discovering effectiveness.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124820909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data path optimisation and delay matching for asynchronous bundled-data balsa circuits","authors":"Norman Kluge, Ralf Wollowski","doi":"10.1109/ICCAD.2017.8203806","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203806","url":null,"abstract":"Balsa provides an open-source design flow where asynchronous circuits are created from high-level specifications, but the syntax-driven translation often results in performance overhead. To improve this, we exploit the fact that bundled-data circuits can be divided into data and control path. Hence, tailored optimisation techniques can be applied to both paths separately. For control path optimisation, STG-based resynthesis has been used (applying logic minimisation). To continue the investigation, we additionally apply synchronous standard tools to optimise the data path. However, this removes the matched delays needed for a properly working bundled-data circuit. Therefore, we also present two algorithms to automatically insert proper matched delays. Our experiments show a performance improvement of up to 44 % and energy consumption improvement of up to 60 % compared to the original Balsa implementation.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128687797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Di He, Wei Zuo, Zuofu Cheng, K. Rupnow, Deming Chen
{"title":"Machine learning on FPGAs to face the IoT revolution","authors":"Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Di He, Wei Zuo, Zuofu Cheng, K. Rupnow, Deming Chen","doi":"10.1109/ICCAD.2017.8203862","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203862","url":null,"abstract":"FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128966964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated-spreading-based macro-refining algorithm for large-scale mixed-size circuit designs","authors":"Szu-To Chen, Yao-Wen Chang, Tung-Chieh Chen","doi":"10.1109/ICCAD.2017.8203818","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203818","url":null,"abstract":"With the increasing use of pre-designed macros in a modern chip and its induced high design complexity, macro placement has become a challenging problem in today's design houses. Most popular macro placement algorithms adopt a three-stage approach: placement prototyping, macro placement, and standard-cell placement, where cell positions after macro placement are assumed the same as those at the prototyping stage, possibly misguiding succeeding standard-cell placement. To close the gap between macro and standard-cell placement, we propose a macro-refining algorithm that adopts an integrated spreading technique considering the spreading of both macros and cells and the dynamic information of cell positions to improve macro placement. We further propose a new force-modulation technique to refine macro placement and a congestion-aware macro shifter to preserve more space for better routability. Extensive experiments based on various macro placements show that our proposed techniques are effective and our macro-refining algorithm can find significantly better placement solutions for large-scale mixed-size circuit designs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of circuit-level non-idealities on vision-based autonomous driving systems","authors":"Handi Yu, Changhao Yan, Xuan Zeng, Xin Li","doi":"10.1109/ICCAD.2017.8203887","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203887","url":null,"abstract":"We describe a novel methodology to validate vision-based autonomous driving systems over different circuit corners with consideration of temperature variation and circuit aging. The proposed work is motivated by the fact that low-level circuit implementation may have a significant impact on system performance, even though such effects have not been appropriately taken into account today. Our approach seamlessly integrates the image data recorded under nominal conditions with comprehensive statistical circuit models to synthetically generate the critical corner cases for which an autonomous driving system is likely to fail. As such, a given automotive system can be robustly validated for these worst-case scenarios that cannot be easily captured by physical experiments.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dajung Lee, Alric Althoff, D. Richmond, R. Kastner
{"title":"A streaming clustering approach using a heterogeneous system for big data analysis","authors":"Dajung Lee, Alric Althoff, D. Richmond, R. Kastner","doi":"10.1109/ICCAD.2017.8203845","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203845","url":null,"abstract":"Data clustering is a fundamental challenge in data analytics. It is the main task in exploratory data mining and a core technique in machine learning. As the volume, variety, velocity, and variability of data grows, we need more efficient data analysis methods that can scale towards increasingly large and high dimensional data sets. We develop a streaming clustering algorithm that is highly amenable to hardware acceleration. Our algorithm eliminates the need to store the data objects, which removes limits on the size of the data that we can analyze. Our algorithm is highly parameterizable, which allows it to fit to the characteristics of the data set, and scale towards the available hardware resources. Our streaming hardware core can handle more than 40 Msamples/s when processing 3-dimensional streaming data and up to 1.78 Msamples/s for 70-dimensional data. To validate the accuracy and performance of our algorithms we compare it with several common clustering techniques on several different applications. The experimental result shows that it outperforms other prior hardware accelerated clustering systems.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SALT: Provably good routing topology by a novel steiner shallow-light tree algorithm","authors":"Gengjie Chen, Peishan Tu, Evangeline F. Y. Young","doi":"10.1109/ICCAD.2017.8203828","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203828","url":null,"abstract":"In a weighted undirected graph, a spanning/Steiner shallow-light tree (SLT) simultaneously approximates (i) shortest distances from a root to the other vertices, and (ii) the minimum tree weight. The Steiner SLT has been proved to be exponentially lighter than the spanning one [1], [2]. In this paper, we propose a novel Steiner SLT construction method called SALT (Steiner shAllow-Light Tree), which is efficient and has the tightest bound over all the state-of-the-art SLT algorithms. Applying SALT to Manhattan space offers a smooth trade-off between rectilinear Steiner minimum tree (RSMT) and rectilinear Steiner minimum arborescence (RSMA) for VLSI routing. In addition, the adaption further reduces the time complexity from O(n2) to O(n log n). The experimental results show that SALT can achieve not only short path lengths and wirelength but also small delay, compared to both classical and recent routing tree construction methods.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lang Feng, Yujie Wang, Jiang Hu, Wai-Kei Mak, J. Rajendran
{"title":"Making split fabrication synergistically secure and manufacturable","authors":"Lang Feng, Yujie Wang, Jiang Hu, Wai-Kei Mak, J. Rajendran","doi":"10.1109/ICCAD.2017.8203794","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203794","url":null,"abstract":"Split fabrication is a promising approach to security against attacks by untrusted foundries. While existing split fabrication methods consider the overhead of conventional objectives such as wirelength and timing, they mostly neglect manufacturability — an unavoidable challenge in nanometer technologies. Observing that security and manufacturability can be addressed in a synergistic manner, this work introduces routing techniques that can simultaneously improve both security and manufacturability in terms of either Chemical Mechanical Planarization (CMP) uniformity or Self-Aligned Double Patterning (SADP) compliance. The effectiveness of these techniques is confirmed by experiments on benchmark circuits.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121016033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Switch cell optimization of power-gated modern system-on-chips","authors":"Dongyoun Yi, Taewhan Kim","doi":"10.1109/ICCAD.2017.8203826","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203826","url":null,"abstract":"This work addresses a practical problem of allocating and placing a minimal number of active switch cells in power gated modern System-on-Chips (SoCs) to save the unnecessary standby leakage under noise (i.e., IR-drop) constraint. Since power gating switch cells are physically directly connected to power rails, their overall allocation structure is synthesized in a stage before the logic cell placement. Consequently, the allocation of switch cells in the pre-placement could lead to unnecessarily high standby leakage for modern designs. This work proposes a practical remedy for this problem at the post-placement stage. Specifically, for an initial design with a grid-based switch cell allocation, which is commonly used design methodology in industry, we propose a comprehensive solution to determining, for each switch cell, (1) whether the cell can be permanently turned off or (2) the type of switch cell for replacement so that the resulting total standby leakage of switch cells should be minimized under the noise constraint. We formulate the problem into a variant of weighted set cover problem and solve it efficiently by employing an approximate set cover algorithm. Through experiments with benchmark circuits in ISCAS89, OPENMSP430, and FPU, it is shown that our method is able to reduce the standby leakage by 35.0% and 13.9% over the initial designs and the designs produced by the previous switch cell optimization method in [5], respectively.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131239244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications","authors":"Jieru Zhao, Liang Feng, Sharad Sinha, Wei Zhang, Yun Liang, Bingsheng He","doi":"10.5555/3199700.3199757","DOIUrl":"https://doi.org/10.5555/3199700.3199757","url":null,"abstract":"High Level Synthesis (HLS) relies on the use of synthesis pragmas to generate digital designs meeting a set of specifications. However, the selection of a set of pragmas depends largely on designer experience and knowledge of the target architecture and digital design. Existing automated methods of pragma selection are very limited in scope and capability to analyze complex design descriptions in high-level languages to be synthesized using HLS. In this paper, we propose COMBA, a comprehensive model-based analysis framework capable of analyzing the effects of a multitude of pragmas related to functions, loops and arrays in the design description using pluggable analytical models, a recursive data collector (RDC) and a metric-guided design space exploration algorithm (MGDSE). When compared with HLS tools like Vivado HLS, COMBA reports an average error of around 1% in estimating performance, while taking only a few seconds for analysis of Polybench benchmark applications and a few minutes for real-life applications like JPEG, Seidel and Rician. The synthesis pragmas recommended by COMBA result in an average 100x speed-up in performance for the analyzed applications, which establishes COMBA as a superior alternative to current state-of-the-art approaches.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116531548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}