{"title":"Overview of the Scalable Communications Core: A Reconfigurable Wireless Baseband in 65nm CMOS","authors":"A. Chun, Kyle McCanta, E. Sandoval, Kapil Gulati","doi":"10.1109/ISVLSI.2009.15","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.15","url":null,"abstract":"The Scalable Communications Core (SCC) is a flexible baseband processor that consists of a heterogeneous set of coarse-grained, programmable accelerators connected via a packet-based 3-ary 2-cube Network-on-Chip (NoC). SCC supports multiple wireless protocols to meet the demand for ubiquitous communications and computing with low power and area.We have recently completed a prototype test chip in a 65nm process and validated it for WiFi and WiMAX protocols. The area and energy efficiency of our test chip is comparable to other basebands found in the literature. To demonstrate its flexibility, additional protocols have been mapped to the architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116686821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Floorplan Driven High Level Synthesis for Crosstalk Noise Minimization in Macro-cell Based Designs","authors":"Hariharan Sankaran, S. Katkoori","doi":"10.1109/ISVLSI.2009.59","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.59","url":null,"abstract":"In DSM regime, due to higher interconnect densities, the coupling noise between adjacent signals is aggravated and can lead to many timing violations. In traditional high-level synthesis (HLS), due to lack detailed physical details, it is difficult to accurately estimate crosstalk. Crosstalk minimization is typically done during routing, which makes it computationally expensive to be used within an iterative design flow. In this paper, we propose a floorplan driven highlevel synthesis framework for minimizing crosstalk in a bus-based architecture. The proposed framework employs a Simulated Annealing engine to simultaneously explore HLS (scheduling, allocation, and binding) and floorplan (module swap, module move, and module rotate) subspaces. The effect of a high-level decision is evaluated by updating the floorplan and identifying crosstalk prone buses (i.e., those buses exceeding Lcrit). The primary goal is to minimize the number of crosstalk violations with minimum area and latency overheads. We have validated the approach by synthesizing netlists down to layout-level using Cadence-SOC encounter followed by detailed crosstalk noise analysis using Cadence Celtic. Experimental results for three DSP benchmarks (DCT, EWF, and FFT) demonstrate that the proposed approach can reduce crosstalk violations by as much as 96% (in 180 nm technology node) with an average reduction of 75% over the designs synthesized with traditional sequential flow.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123548996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Rerouting Algorithms for Congestion Mitigation","authors":"M. Chaudhry, Z. Asad, A. Sprintson, J. Hu","doi":"10.1109/ISVLSI.2009.38","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.38","url":null,"abstract":"Congestion mitigation and overflow avoidance are two of the major goals of the global routing stage. With a significant increase in the chip size and routing complexity,congestion and overflow have become critical issues in physical design automation. In this paper we present several routing algorithms for congestion reduction and overflow avoidance.Our methods are based on ripping up nets that go through the congested regions and replacing them with congestion-aware Steiner trees. We propose several efficient algorithms for finding congestion-aware Steiner trees and evaluate their performance using the ISPD routing benchmarks. We also show that the novel technique of network coding contributes to further improvements in routability and reduction of congestion. Accordingly, we propose an algorithm for identifying efficient congestion-aware network coding topologies and evaluate its performance.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129765485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Hardware Architecture for Multimedia Encryption and Authentication Using the Discrete Wavelet Transform","authors":"A. Pande, Joseph Zambreno","doi":"10.1109/ISVLSI.2009.26","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.26","url":null,"abstract":"This paper introduces a zero-overhead encryption and authentication scheme for real-time embedded multimedia systems. The parametrized construction of the Discrete Wavelet Transform (DWT) compression block is used to introduce a free parameter in the design. It allows building a keyspace for lightweight multimedia encryption. The parametrization yields rational coefficients leading to an efficient fixed point hardware implementation. A clock speed of over 240 MHz was achieved on a Xilinx Virtex 5 FPGA. Comparison with existing approaches was performed to indicate the high throughput and low hardware overhead in adding the security feature to the DWT architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132108345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lifetime Reliability Aware Design Flow Techniques for Dual-Vdd Based Platform FPGAs","authors":"P. Mangalagiri, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2009.42","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.42","url":null,"abstract":"Increasing on-chip power densities with aggressive technology scaling has led to a low-power FPGA fabric with dual supply voltages. Such low-power techniques coupled with the heterogeneity of components on a FPGA have led to non-uniform aging of components due to temperature and voltage dependent failure mechanisms. In this paper, we present techniques in placement and routing stages of the design flow that will increase the average life-time of components by ensuring uniform aging. We first study the impact of temperature and voltage variations on lifetime reliability of components. In the presence of such variations, we study the impact of aging in FPGA interconnects due to Electromigration (EM), and di-electric breakdown due to Time Dependent Dielectric Breakdown (TDDB). We also consider the performance degradation due to Hot Carrier Instability (HCI) in our design flow optimizations.The proposed reliability aware design flow techniques achieve anaverage of 65.8% and 75% improvement in lifetime of LUTs and interconnect wires respectively.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130812371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Schilling, Magnus Själander, P. Larsson-Edefors
{"title":"Scheduling for an Embedded Architecture with a Flexible Datapath","authors":"Thomas Schilling, Magnus Själander, P. Larsson-Edefors","doi":"10.1109/ISVLSI.2009.6","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.6","url":null,"abstract":"Embedded systems put stringent demands on post-fabrication flexibility as well as computing performance efficiency. The FlexSoC scheme approaches the implementation of embedded systems from a general-purpose processor point of view: The FlexCore processor has a datapath whose configuration is under instruction control; in its minimal configuration, the processor represents a simple 5-stage pipeline. However, thanks to a flexible processor interconnect, the FlexCore datapath configuration can be changed at run-time to boost performance for the currently executed code. The consequence of this flexibility is that pipelining is not hard-coded into the datapath, but all instruction scheduling needs to be done by software at compile time. We present a scheduling technique for the FlexCore processor allowing for efficient use of datapath resources over a flexible interconnect. The flexible interconnect indeed offers plenty of opportunities for parallel operations, but it also makes the analysis of instruction dependencies difficult. Thus, we propose to use a SAT-solver to enable the scheduler to efficiently check constraints on computing and communication resources. In an evaluation on four different benchmarks, our scheduler is shown to produce schedules that are as efficient as fine-tuned, manual schedules.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129727708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variation Aware Routing for Three-Dimensional FPGAs","authors":"Chen Dong, S. Chilstedt, Deming Chen","doi":"10.1109/ISVLSI.2009.44","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.44","url":null,"abstract":"To maximize the potential of three-dimensional integrated circuit architectures, 3D CAD tools must be developed that are on-par with their 2D counterparts. In this paper, we present a statistical static timing analysis (SSTA) engine designed to deal with both the uncorrelated and correlated variations in 3D FPGAs. We consider the effects of intra-die and inter-die variation. Using the 3D physical design tool TPR as a base, we develop a new 3D routing algorithm which improves the average performance of two layer designs by over 22% and three layer designs by over 27%. To the best of our knowledge, this is the first physical design tool to consider variation in the routing and timing analysis of 3D FPGAs.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128970971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leakage Power and Side Channel Security of Nanoscale Cryptosystem-on-Chip (CoC)","authors":"Amir Khatib Zadeh, C. Gebotys","doi":"10.1109/ISVLSI.2009.46","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.46","url":null,"abstract":"This paper investigates the viability of using leakage power consumption as a source of side channel information. The side channel effect is characterized in leakage power. It is shown that the increasing trend of leakage power is highly correlated with security vulnerability of cryptosystems. Addressing the severity of the side channel threat in nanoscale Cryptosystem-on-Chip (CoC), we examine the leakage reduction techniques for the side channel security application. The result shows among the circuit-based reduction techniques high Vth transistor assignment which significantly reduces both average and standard deviation of the leakage power can be exploited as a side channel aware leakage reduction in design and implementation of CoC in submicron era. The findings in this work which are presented for the first time are crucial for the development of side channel resistant cryptosystems in the upcoming CMOS technologies.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123173803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Concatto, D. Matos, L. Carro, F. Kastensmidt, A. Susin, M. Kreutz
{"title":"NoC Power Optimization Using a Reconfigurable Router","authors":"C. Concatto, D. Matos, L. Carro, F. Kastensmidt, A. Susin, M. Kreutz","doi":"10.1109/ISVLSI.2009.7","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.7","url":null,"abstract":"In real applications there are different communication needs among the cores. When NoCs are the means to interconnect the cores, the use of some techniques to optimize the communication are indispensable. From the performance point of view, large buffer sizes ensure performance during different applications execution, but unfortunately, these same buffers are the main responsible for the router total power dissipation. Another aspect is that by sizing buffers for the worst case latency incurs in extra dissipation for the mean case, which is much more frequent. To cope with this problem, in this paper we propose a dynamically reconfigurable router for a NoC. With the reconfigurable router it was possible to reduce the congestion in the network, while at the same time reducing power dissipation and improving energy.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127620724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Sum Single-Carry Self-Timed Adder Designs","authors":"P. Balasubramanian, D. Edwards","doi":"10.1109/ISVLSI.2009.13","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.13","url":null,"abstract":"This paper presents designs of self-timed dual-sum single-carry or dual-bit adder function blocks, constructed using commercially available synchronous library resources (standard cells) and validated using synchronous tools. Specifically, the proposed adder modules qualify as either quasi-delay-insensitive or speed-independent and satisfy Seitz’s weak-indication timing constraints. The delay-insensitive version of the ripple carry adder topology has been used to analyze the designs. The indication (completion) is either made implicit in the topology (local indication) or considerably isolated from the actual data path (a new variant of global indication). The proposed adders are found to exhibit improved power and performance parameters, whilst being competitive in terms of area, in comparison with those pertaining to other self-timed logic realizations","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131885102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}