{"title":"GPU-friendly floating random walk algorithm for capacitance extraction of VLSI interconnects","authors":"Kuangya Zhai, Wenjian Yu, H. Zhuang","doi":"10.7873/DATE.2013.336","DOIUrl":"https://doi.org/10.7873/DATE.2013.336","url":null,"abstract":"The floating random walk (FRW) algorithm is an important field-solver algorithm for capacitance extraction, which has several merits compared with other boundary element method (BEM) based algorithms. In this paper, the FRW algorithm is accelerated with the modern graphics processing units (GPUs). We propose an iterative GPU-based FRW algorithm flow and the technique using an inverse cumulative probability array (ICPA), to reduce the divergence among walks and the global-memory accessing. A variant FRW scheme is proposed to utilize the benefit of ICPA, so that it accelerates the extraction of multi-dielectric structures. The technique for extracting multiple nets concurrently is also discussed. Numerical results show that our GPU-based FRW brings over 20X speedup for various test cases with 0.5% convergence criterion over the CPU counterpart. For the extraction of multiple nets, our GPU-based FRW outperforms the CPU counterpart by up to 59X.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"1661-1666"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77194654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is split manufacturing secure?","authors":"Jeyavijayan Rajendran, O. Sinanoglu, R. Karri","doi":"10.7873/DATE.2013.261","DOIUrl":"https://doi.org/10.7873/DATE.2013.261","url":null,"abstract":"Split manufacturing of integrated circuits (IC) is being investigated as a way to simultaneously alleviate the cost of owning a trusted foundry and eliminate the security risks associated with outsourcing IC fabrication. In split manufacturing, a design house (with a low-end, in-house, trusted foundry) fabricates the Front End Of Line (FEOL) layers (transistors and lower metal layers) in advanced technology nodes at an untrusted high-end foundry. The Back End Of Line (BEOL) layers (higher metal layers) are then fabricated at the design house's trusted low-end foundry. Split manufacturing is considered secure (prevents reverse engineering and IC piracy) as it hides the BEOL connections from an attacker in the FEOL foundry. We show that an attacker in the FEOL foundry can exploit the heuristics used in typical floorplanning, placement, and routing tools to bypass the security afforded by straightforward split manufacturing. We developed an attack where an attacker in the FEOL foundry can connect 96% of the missing BEOL connections correctly. To overcome this security vulnerability in split manufacturing, we developed a fault analysis-based defense. This defense improves the security of split manufacturing by deceiving the FEOL attacker into making wrong connections.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"1259-1264"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77321576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Li, S. L. Beux, C. Monat, X. Letartre, I. O’Connor
{"title":"Optical Look Up Table","authors":"Zhen Li, S. L. Beux, C. Monat, X. Letartre, I. O’Connor","doi":"10.7873/DATE.2013.184","DOIUrl":"https://doi.org/10.7873/DATE.2013.184","url":null,"abstract":"The computation capacity of conventional FPGAs is directly proportional to the size and expressive power of Look Up Table (LUT) resources. Individual LUT performance is limited by transistor switching time and power dissipation, defined by the CMOS fabrication process. In this paper we propose OLUT, an optical core implementation of LUT, which has the potential for low latency and low power computation. In addition, the use of Wavelength Division Multiplexing (WDM) allows parallel computation, which can further increase computation capacity. Preliminary experimental results demonstrate the potential for optically assisted on-chip computation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"873-876"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76227922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LFSR seed computation and reduction using SMT-based fault-chaining","authors":"Dhrumeel Bakshi, M. Hsiao","doi":"10.7873/DATE.2013.226","DOIUrl":"https://doi.org/10.7873/DATE.2013.226","url":null,"abstract":"We propose a new method to derive a small number of LFSR seeds for Logic BIST to cover all detectable faults as a first-order satisfiability problem involving extended theories. We use an SMT (Satisfiability Modulo Theories) formulation to efficiently combine the tasks of test-generation and seed-computation. We make use of this formulation in an iterative seed-reduction flow which enables the “chaining” of hard-to-test faults using very few seeds. Experimental results demonstrate that up to 79% reduction in the number of seeds can be achieved.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"337 1","pages":"1071-1076"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76386795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Handling discontinuous effects in modeling spatial correlation of wafer-level analog/RF tests","authors":"K. Huang, Nathan Kupp, J. Carulli, Y. Makris","doi":"10.7873/DATE.2013.123","DOIUrl":"https://doi.org/10.7873/DATE.2013.123","url":null,"abstract":"In an effort to reduce the cost of specification testing in analog/RF circuits, spatial correlation modeling of wafer-level measurements has recently attracted increased attention. Existing approaches for capturing and leveraging such correlation, however, rely on the assumption that spatial variation is smooth and continuous. This, in turn, limits the effectiveness of these methods on actual production data, which often exhibits localized spatial discontinuous effects. In this work, we propose a novel approach which enables spatial correlation modeling of wafer-level analog/RF tests to handle such effects and, thereby, to drastically reduce prediction error for measurements exhibiting discontinuous spatial patterns. The core of the proposed approach is a k-means algorithm which partitions a wafer into k clusters, as caused by discontinuous effects. Individual correlation models are then constructed within each cluster, revoking the assumption that spatial patterns should be smooth and continuous across the entire wafer. Effectiveness of the proposed approach is evaluated on industrial probe test data from more than 3,400 wafers, revealing significant error reduction over existing approaches.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"553-558"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78073680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Ruch, T. Brunschwiler, S. Paredes, G. Meijer, B. Michel
{"title":"Roadmap towards ultimately-efficient zeta-scale datacenters","authors":"P. Ruch, T. Brunschwiler, S. Paredes, G. Meijer, B. Michel","doi":"10.1109/HPCSim.2013.6641408","DOIUrl":"https://doi.org/10.1109/HPCSim.2013.6641408","url":null,"abstract":"Chip microscale liquid-cooling reduces thermal resistance and improves datacenter efficiency with higher coolant temperatures by eliminating chillers and allowing thermal energy re-use in cold climates. Liquid cooling enables an unprecedented density in future computers to a level similar to a human brain. This is mediated by a dense 3D architecture for interconnects, fluid cooling, and power delivery of energetic chemical compounds transported in the same fluid. Vertical integration improves memory proximity and electrochemical power delivery creating valuable space for communication. This strongly improves large system efficiency thereby allowing computers to grow beyond exa-scale.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"1339-1344"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76627291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using synchronization stalls in power-aware accelerators","authors":"A. Jooya, A. Baniasadi","doi":"10.7873/DATE.2013.091","DOIUrl":"https://doi.org/10.7873/DATE.2013.091","url":null,"abstract":"GPUs spend significant time on synchronization stalls. Such stalls provide ample opportunity to save leakage energy in GPU structures left idle during such periods. In this paper we focus on the register file structure of NVIDIA GPUs and introduce sync-aware low leakage solutions to reduce power. Accordingly, we show that applying the power gating technique to the register file during synchronization stalls can improve power efficiency without considerable performance loss. To this end, we equip the register file with two leakage power saving modes with different levels of power saving and wakeup latencies.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"246 1","pages":"400-403"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76735606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TreeFTL: Efficient RAM management for high performance of NAND flash-based storage systems","authors":"Chundong Wang, W. Wong","doi":"10.7873/DATE.2013.086","DOIUrl":"https://doi.org/10.7873/DATE.2013.086","url":null,"abstract":"NAND flash memory is widely used for secondary storage today. The flash translation layer (FTL) is the embedded software that is responsible for managing and operating in flash storage system. One important module of the FTL performs RAM management. It is well-known to have a significant impact on flash storage system's performance. This paper proposes an efficient RAM management scheme called TreeFTL. As the name suggests, TreeFTL organizes address translation pages and data pages in RAM in a tree structure, through which it dynamically adapts to workloads by adjusting the partitions for address mapping and data buffering. TreeFTL also employs a lightweight mechanism to implement the least recently used (LRU) algorithm for RAM cache evictions. Experiments show that compared to the two latest schemes for RAM management in flash storage system, TreeFTL can reduce service time by 46.6% and 49.0% on average, respectively, with a 64MB RAM cache.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"241 1","pages":"374-379"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77477888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Gangadharan, S. Chakraborty, Roger Zimmermann
{"title":"Quality-aware media scheduling on MPSoC platforms","authors":"Deepak Gangadharan, S. Chakraborty, Roger Zimmermann","doi":"10.7873/DATE.2013.204","DOIUrl":"https://doi.org/10.7873/DATE.2013.204","url":null,"abstract":"Applications that stream multiple video/audio or video+audio clips are being implemented in embedded devices. A Picture-in-Picture (PiP) application is one such application scenario, where two videos are played simultaneously. Although the PiP application is very efficiently handled in televisions and personal computers by providing maximum quality of service to the multiple streams, it is a difficult task in devices with resource constraints. In order to efficiently utilize the resources, it is essential to derive the necessary processor cycles for multiple video streams such that they are displayed with some prespecified quality constraint. Therefore, we propose a network calculus based formal framework to help schedule multiple media streams in the presence of buffer contraints. Further, our framework also presents a schedulability analysis condition to check if the multimedia streams can be scheduled such that a prespecified quality constraint is satisfied with the available service. We present this framework in the context of a PiP application, but it is applicable in general for multiple media streams. The results obtained using the formal framework were further verified using experiments involving system simulation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"260 1","pages":"976-981"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76299493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brahim Al Farisi, Karel Bruneel, João MP Cardoso, D. Stroobandt
{"title":"An automatic tool flow for the combined implementation of multi-mode circuits","authors":"Brahim Al Farisi, Karel Bruneel, João MP Cardoso, D. Stroobandt","doi":"10.7873/DATE.2013.174","DOIUrl":"https://doi.org/10.7873/DATE.2013.174","url":null,"abstract":"A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6× to 5.1× without significant performance penalties.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"43 1","pages":"821-826"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76363171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}