Kai Lu, Zhaoshi Li, Leibo Liu, Jiawei Wang, S. Yin, Shaojun Wei
{"title":"ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms","authors":"Kai Lu, Zhaoshi Li, Leibo Liu, Jiawei Wang, S. Yin, Shaojun Wei","doi":"10.1109/iccad45719.2019.8942089","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942089","url":null,"abstract":"Sparse Matrix-Vector Multiplication (SpMV) is the most important sparse linear algebra kernel in both scientific and engineering applications. Due to its irregular control flow and data access pattern, Von Neumann architectures like CPUs and GPUs cannot fully exploit the inherent parallelism of $S$ pMV. Although FPGAs can efficiently accelerate SpMV in a dataflow manner, their performance is degraded in face of large matrices that exceed the capacity of on-chip memory because of excessive rescheduling of data. In this paper we propose ReDESK, a Reconfigurable Dataflow Engine for Sparse Kernels, for emerging tightly-coupled CPU-FPGA heterogeneous platforms. To fully exploit the heterogeneity, we design a novel representation of sparse matrix that is tailored for data prefetching on CPU-side and streaming processing on FPGA-side. In this way ReDESK can fully utilize the memory bandwidth regardless of the scale of SpMV problem. We evaluate ReDESK on Intel HARP-2 platform with a set of matrices from the University of Florida sparse matrix collection. The result demonstrates an average bandwidth utilization of 0.094 GFLOP/GB, which is 1.6-4.3x more efficient than previous SpMV on FPGAs.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116982980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2019 CAD Contest: System-level FPGA Routing with Timing Division Multiplexing Technique","authors":"Yu-Hsuan Su, Richard Sun, Pei-Hsin Ho","doi":"10.1109/iccad45719.2019.8942051","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942051","url":null,"abstract":"The time division multiplexing technique overcomes the bandwidth limitation by allowing FPGA chips to transmit multiple signals the maximum clocking frequency. With the additional multiplexers, this technique dramatically increases system-level routing capability in the FPGA-based emulator. However, the large number of virtual wires in the chip interconnection may impact emulation performance. The system-level FPGA routing tends to connect all virtual wires (signals) and considers emulation performance. At the same time, the challenge for system-level FPGA routing using time division multiplexing lies in the emulation performance.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127475439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debjyoti Bhattacharjee, Abdullah Ash-Saki, M. Alam, A. Chattopadhyay, Swaroop Ghosh
{"title":"MUQUT: Multi-Constraint Quantum Circuit Mapping on NISQ Computers: Invited Paper","authors":"Debjyoti Bhattacharjee, Abdullah Ash-Saki, M. Alam, A. Chattopadhyay, Swaroop Ghosh","doi":"10.1109/iccad45719.2019.8942132","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942132","url":null,"abstract":"Rapid advancement in the domain of quantum technologies have opened up researchers to the real possibility of experimenting with quantum circuits, and simulating small-scale quantum programs. Nevertheless, the quality of currently available qubits and environmental noise pose a challenge in smooth execution of the quantum circuits. Therefore, efficient design automation flows for mapping a given algorithm to the Noisy Intermediate Scale Quantum (NISQ) computer becomes of utmost importance. State-of-the-art quantum design automation tools are primarily focused on reducing logical depth, gate count and qubit counts with recent emphasis on topology-aware (nearest-neighbour compliance) mapping. In this work, we extend the technology mapping flows to simultaneously consider the topology and gate fidelity constraints while keeping logical depth and gate count as optimization objectives. We provide a comprehensive problem formulation and multi-tier approach towards solving it. The proposed automation flow is compatible with commercial quantum computers, such as IBM QX and Rigetti. Our simulation results over 10 quantum circuit benchmarks, show that the fidelity of the circuit can be improved up to 3.37 × with an average improvement of 1.87 ×.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124818001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shashank Varshney, Hameedah Sultan, Palkesh Jain, S. Sarangi
{"title":"NanoTherm: An Analytical Fourier-Boltzmann Framework for Full Chip Thermal Simulations","authors":"Shashank Varshney, Hameedah Sultan, Palkesh Jain, S. Sarangi","doi":"10.1109/iccad45719.2019.8942159","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942159","url":null,"abstract":"Temperature simulation is a classic problem in EDA, and researchers have been working on it for at least the last 15 years. In this paper, we focus on fast Green's function based approaches, where computing the temperature profile is as simple as computing the convolution of the power profile with the Green's function. We observe that for many problems of interest the process of computing the Green's function is the most time consuming phase, because we need to compute it with the slower finite difference or finite element based approaches. In this paper we propose a solution, NanoTherm, to compute the Green's function using a fast analytical approach that exploits the symmetry in the thermal distribution. Secondly, conventional analyses based on the Fourier's heat transfer equation fail to hold at the nanometer level. To accurately compute the temperature at the level of a standard cell, it is necessary to solve the Boltzmann transport equation (BTE) that accounts for quantum mechanical effects. This research area is very sparse. Conventional approaches ignore the quantum effects, which can result in a 25 to 60% error in temperature calculation. Hence, we propose a fast analytical approach to solve the BTE and obtain an exact solution in the Fourier transform space. Using our fast analytical models, we demonstrate a speedup of 7-668X over state of the art techniques with an error limited to 3% while computing the combined Green's function.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125885122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An All-Digital True Random Number Generator Based on Chaotic Cellular Automata Topology","authors":"S. Best, Xiaolin Xu","doi":"10.1109/iccad45719.2019.8942050","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942050","url":null,"abstract":"True random number generator (TRNG) is an important primitive in cryptographic applications. In this paper, a TRNG based on a self-timed ring structure is presented, the basic elements of the ring is a realization of a chaotic cellular automata topology. In particular, the proposed TRNG design is fully synthesizable with standard all-digital components. Test chips of the proposed TRNG structure were fabricated with 40nm TSMC technology node, and the utilized overhead is only 75 NAND gates equivalent, with a die area of $270 mu m^{2}$. Experimental results demonstrated that the TRNG test chips can generate random numbers at a high bit rate: 1600Mb/s. The test sequences generated by the TRNG test chips passed all test statistics of the widely used test suite: NIST SP800-22, as well as the independent and identically distributed (IID) test of NIST SP800-90B.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114500488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Obstacle-Aware Group-Based Length-Matching Routing for Pre-Assignment Area-I/O Flip-Chip Designs","authors":"Yu-Hsuan Chang, Hsiang-Ting Wen, Yao-Wen Chang","doi":"10.1109/iccad45719.2019.8942123","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942123","url":null,"abstract":"A robust redistribution layer (RDL) router is required for advanced package designs, where the length-matching constraint for a group of nets needs to be considered to preserve good timing properties at the package level. For area-I/O flip-chip design with pre-assigned nets on RDLs, we propose the first group-based length-matching routing framework that can simultaneously minimize the wirelengths of an arbitrary group of nets with and without equal-length constraints, based on an equal-length-aware A*-search algorithm and a bounded sliceline grid (BSG) snaking one. For the irregular structure of the area-I/O flip-chip design, we apply Delaunay triangulation and Voronoi diagram to model the routing resources more precisely. To effectively consider the equal-length constraints in the earlier stage, we first profile the routing resource to obtain an approximation of the longest net, and then adopt the equal-length-aware A*-search algorithm to extend shorter nets to match the estimated longest net. A BSG-based snaking method is then applied to meet the equal-length constraint, while preserving the minimized wirelength of unconstrained nets. Experimental results demonstrate that our framework can solve all benchmarks effectively and efficiently.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124727614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximating Behavioral HW Accelerators through Selective Partial Extractions onto Synthesizable Predictive Models","authors":"Siyuan Xu, Benjamin Carrión Schäfer","doi":"10.1109/iccad45719.2019.8942119","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942119","url":null,"abstract":"This work presents a method to selectively extract portions of a behavioral description to be synthesized as a hardware accelerator using High-Level Synthesis (HLS) onto different predictive models in order to trade-off the accuracy of the accelerators' outputs with area and power. Because the main aim of this work is to synthesize the newly approximated behavioral description, we investigate the use of different predictive models, mainly linear regression (LR) and multi-layer perceptron (MLP), highlighting the trade-offs of using one over the other. In addition, we further extend the search space by reducing the precision of the predictive models' coefficients, thus, leading to a wider range of solutions. Experimental results using a variety of benchmarks from different domains show that our proposed method works well compared to another state of the art approximate solution.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123835775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices","authors":"Weidong Cao, Liu Ke, Ayan Chakrabarti, Xuan Zhang","doi":"10.1109/iccad45719.2019.8942099","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942099","url":null,"abstract":"Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6∼12-bit) to achieve a moderate quantization resolution (4∼8-bit). Such optimistic assumption of RRAM resolution, however, is not supported by fabrication data of RRAM arrays in large-scale production process. In this paper, we propose an NN-inspired super-resolution ADC based on low-precision RRAM devices by taking the advantage of a co-design methodology that combines a pipelined hardware architecture with a custom NN training framework. Results obtained from SPICE simulations demonstrate that our method leads to robust design of a 14-bit super-resolution ADC using 3-bit RRAM devices with improved power and speed performance and competitive figure-of-merits (FoMs). In addition to the linear uniform quantization, the proposed ADC can also support configurable high-resolution nonlinear quantization with high conversion speed and low conversion energy, enabling future intelligent analog-to-information interfaces for near-sensor analytics and processing.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121578763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. O. Nunes, Karim M. El Defrawy, Norrathep Rattanavipanon, G. Tsudik
{"title":"PURE: Using Verified Remote Attestation to Obtain Proofs of Update, Reset and Erasure in low-End Embedded Systems","authors":"I. O. Nunes, Karim M. El Defrawy, Norrathep Rattanavipanon, G. Tsudik","doi":"10.1109/iccad45719.2019.8942118","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942118","url":null,"abstract":"Remote Attestation ($mathcal{R}mathrm{A}$) is a security service that enables a trusted verifier ($mathcal{V}{text{rf}}$) to measure current memory state of an untrusted remote prover ($mathcal{P}{text{rv}}$). If correctly implemented, $mathcal{R}mathrm{A}$ allows $mathcal{V}{text{rf}}$ to remotely detect if $mathcal{P}{text{rv}}$'s memory reflects a compromised state. However, $mathcal{R}{mathrm{A}}$ by itself offers no means of remedying the situation once $mathcal{P}$ rv is determined to be compromised. In this work we show how a secure $mathcal{R}mathrm{A}$ architecture can be extended to enable important and useful security services for low-end embedded devices. In particular, we extend the formally verified $mathcal{R}mathrm{A}$ architecture, VRASED, to implement provably secure software update, erasure, and system-wide resets. When (serially) composed, these features guarantee to $mathcal{V}{text{rf}}$ that a remote $mathcal{P}{text{rv}}$ has been updated to a functional and malware-free state, and was properly initialized after such process. These services are provably secure against an adversary (represented by malware) that compromises $mathcal{P}{text{rv}}$ and exerts full control of its software state. Our results demonstrate that such services incur minimal additional overhead (0.4% extra hardware footprint, and 100-s milliseconds to generate combined proofs of update, erasure, and reset), making them practical even for the lowest-end embedded devices, e.g., those based on MSP430 or AVR ATMega micro-controller units (MCUs). All changes introduced by our new services to VRASED trusted components are also formally verified.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"7 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114022225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao-Yu Chi, Zi-Jun Lin, Chia-Hao Hung, C. Liu, Hung-Ming Chen
{"title":"Achieving Routing Integrity in Analog Layout Migration via Cartesian Detection Lines","authors":"Hao-Yu Chi, Zi-Jun Lin, Chia-Hao Hung, C. Liu, Hung-Ming Chen","doi":"10.1109/iccad45719.2019.8942088","DOIUrl":"https://doi.org/10.1109/iccad45719.2019.8942088","url":null,"abstract":"In order to improve design productivity, proper layout automation tools are desired for analog circuits. Layout migration is one possible approach to generate a new layout for given circuits with different device sizes or different technology, and still keep the original layout topology. However, routing behaviors are often not mentioned in previous works, which requires a complete rerouting that may not follow the original style. Pan [16] first proposed a Constrained Delaunay Triangulation (CDT) based model to keep the routing behavior during layout migration. However, because the device sizes and related distance may be different in the new layout, some reference lines in CDT models may be removed, resulting in some missing nets after migration. In this paper, a novel Cartesian Detection Line (CDL) based model is proposed to preserve the routing behavior in original layouts. Because alternative lines in the modified placement can be easily found to prevent from missing nets, the proposed CDL model greatly improves the routing completeness during layout migration. Several routing refinement techniques are also proposed to solve the routing issues due to block displacement. In our experiments, the routing completeness can be improved to almost 100% with the proposed CDL model, which greatly reduces the design efforts.","PeriodicalId":363364,"journal":{"name":"2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}