Hai Wei, Jie Zhang, Lan Wei, N. Patil, A. Lin, M. Shulaker, Hong-Yu Chen, H. Wong, S. Mitra
{"title":"Carbon nanotube imperfection-immune digital VLSI: Frequently asked questions updated","authors":"Hai Wei, Jie Zhang, Lan Wei, N. Patil, A. Lin, M. Shulaker, Hong-Yu Chen, H. Wong, S. Mitra","doi":"10.1109/ICCAD.2011.6105330","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105330","url":null,"abstract":"Carbon Nanotube Field-Effect Transistors (CNFETs) are excellent candidates for designing highly energy-efficient future digital systems. However, carbon nanotubes (CNTs) are inherently highly subject to imperfections that pose major obstacles to robust CNFET digital VLSI. This paper summarizes commonly raised questions and concerns about CNFET technology through a series of frequently asked questions. The specific questions addressed in this paper are motivated by recent advances in the field since the publication of our earlier paper on frequently asked questions in the Proceedings of the 2009 Design Automation Conference.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"2 1","pages":"227-230"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88420367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Property-specific sequential invariant extraction for SAT-based unbounded model checking","authors":"Hu-Hsi Yeh, Cheng-Yin Wu, Chung-Yang Huang","doi":"10.1109/ICCAD.2011.6105402","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105402","url":null,"abstract":"In this paper, we propose a property-specific sequential invariant extraction algorithm to improve the performance of the SAT-based Unbounded Modeling Checkers (UMCs). By analyzing the property-related predicates and their corresponding high-level design constructs such as FSMs and counters, we can quickly identify the sequential invariants that are useful in improving the property proving capabilities. We utilize these sequential invariants to refine the inductive hypothesis in induction-based UMCs, and to improve the accuracy of reachable state approximation in interpolation-based UMCs. The experimental results show that our tool can outperform a state-of-the-art UMC in most cases, especially for the difficult true properties.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"674-678"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73267182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-power multiple-bit upset tolerant memory optimization","authors":"Seokjoong Kim, Matthew R. Guthaus","doi":"10.1109/ICCAD.2011.6105388","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105388","url":null,"abstract":"In this paper, we propose a framework for analyzing Soft Error Rates (SER) including Multiple-Bit Upsets (MBU). Then, using this framework, we optimize the soft error tolerant voltage (Vtol) and interleaving distance (ID) of low-power, error-tolerant memories. Experimental results show that the total power can be reduced by an average of 30.5% with Vtol optimization and an average of 40.9% by simultaneously considering Vtol and ID together when compared to worst-case design practices.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"33 1","pages":"577-581"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78332473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cong Xu, Dimin Niu, Xiaochun Zhu, Seung H. Kang, M. Nowak, Yuan Xie
{"title":"Device-architecture co-optimization of STT-RAM based memory for low power embedded systems","authors":"Cong Xu, Dimin Niu, Xiaochun Zhu, Seung H. Kang, M. Nowak, Yuan Xie","doi":"10.1109/ICCAD.2011.6105369","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105369","url":null,"abstract":"Spin-transfer torque random access memory (STT-RAM) is a fast, scalable, durable non-volatile memory which can be embedded into standard CMOS process. A wide range of write speeds from 1ns to 100ns have been reported for STT-RAM. The switching current of magnetic tunnel junction (MTJ) (which is the storage element of STT-RAM) is inversely proportional to the write pulse width. In this work, we propose a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We take the typical in-plane MTJ and advanced perpendicular MTJ (PMTJ) as our optimization targets. Our study shows that reducing write pulse width will harm read latency and energy. It is observed that “sweet spots” of write pulse width which minimize the write energy or write latency of STT-RAM caches may exist. The optimal write pulse width depends on MTJ specifications, STT-RAM capacity and I/O width. The simulation results indicate that by utilizing PMTJ, the optimized STT-RAM can compete against SRAM and DRAM as universal memory replacement in low power embedded systems.1","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"11 1","pages":"463-470"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82029092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving shared cache behavior of multithreaded object-oriented applications in multicores","authors":"M. Kandemir, Shekhar Srikantaiah, S. Son","doi":"10.1109/ICCAD.2011.6105315","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105315","url":null,"abstract":"Understanding shared cache performance when executing multithreaded object-oriented applications and optimizing these applications for multicores have not received much attention. In this paper, we first quantify the intra-thread and inter-thread cache line (block) reuse characteristics of a set of multithreaded C++ programs when executed in shared cache based multicores. Our results show that, as far as shared on-chip caches are concerned, inter-thread cache line (block) reuse distances are much higher than intra-thread cache line reuse distances. We study the impact of these characteristics on the hit/miss behavior of the shared last-level cache on a commercial multicore machine. We then show that, by rearranging accesses to the objects shared across different threads and to the objects stored in nearby memory locations, inter-thread (temporal and spatial) object reuse distances can be reduced, which in turn helps to reduce inter-thread cache line reuse distances. The results we collected using eight multithreaded applications show that our proposed shared cache-aware code restructuring strategy can reduce misses in the last-level on-chip cache of a commercial multicore machine by 25.4%, on average. These savings in cache misses translate in turn to average execution time improvement of 11.9%.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"118-125"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82182215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving dual Vt technology by simultaneous gate sizing and mechanical stress optimization","authors":"J. Gu, G. Qu, Lin Yuan, Cheng Zhuo","doi":"10.1109/ICCAD.2011.6105410","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105410","url":null,"abstract":"Process-induced mechanical stress is used to enhance carrier mobility and drive current in contemporary CMOS technologies. Stressed cells have reduced delay but larger leakage consumption. Its efficient power/delay trading ratio makes mechanical stress an enticing alternative to other power optimization techniques. This paper proposes an effective urgentpath guided approach that improves dual Vt technique by incorporating gate sizing and mechanical stress simultaneously. The introduction of mechanical stress is shown to achieve 9.8% leakage and 2.8% total power savings over combined gate sizing and dual Vt approach.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"59 1","pages":"732-735"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85980466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen-Wei Lin, Hao-Yu Yang, Chin-Yuan Huang, Hung-Hsin Chen, M. Chao
{"title":"Detecting stability faults in sub-threshold SRAMs","authors":"Chen-Wei Lin, Hao-Yu Yang, Chin-Yuan Huang, Hung-Hsin Chen, M. Chao","doi":"10.1109/ICCAD.2011.6105301","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105301","url":null,"abstract":"Detecting stability faults has been a crucial task and a hot research topic for the testing of conventional super-threshold 6T SRAM in the past. When lowering the supply voltage of SRAM to the subthreshold region, the impact of stability faults may significantly change, and hence the test methods developed in the past for detecting stability faults may no longer be effective. In this paper, we first categorize the subthreshold-SRAM designs into different types according to their bit-cell structures. Based on each type, we then analyze the difference of its stability faults compared to the conventional super-threshold 6T SRAM, and discuss how the stability-fault test methods should be modified accordingly. A series of experiments are conducted to validate the effectiveness of each stability-fault test method for different types of subthreshold-SRAM designs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"67 1","pages":"28-33"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84047879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avinash Karanth Kodi, R. Morris, D. DiTomaso, Ashwini Sarathy, A. Louri
{"title":"Co-design of channel buffers and crossbar organizations in NoCs architectures","authors":"Avinash Karanth Kodi, R. Morris, D. DiTomaso, Ashwini Sarathy, A. Louri","doi":"10.1109/ICCAD.2011.6105329","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105329","url":null,"abstract":"Network-on-Chips (NoCs) have emerged as a scalable solution to the wire delay constraints, thereby providing a high-performance communication fabric for future multicores. Research has shown that power, area and performance of Network-on-Chips (NoCs) architecture are tightly integrated with the design and optimization of the link and router (buffer and crossbar). Recent work has shown that adaptive channel buffers (on-link storage) can considerably reduce power consumption and area overhead by reducing or replacing the power hungry router buffers. However, channel buffer design can lead to Head-of-Line (HoL) blocking which eventually reduces the throughput of the network. In this paper, we explore the design space of organizing channel buffers and router crossbars to improve the performance (latency, throughput) while reducing the power consumption. Our proposed designs analyze the power-performance-area trade-off in designing channel buffers for NoC architectures while overcoming HoL blocking through crossbar optimizations. Our simulation and NoC design synthesis shows that for a 8 × 8 mesh architecture, we can reduce the power consumption by 25–40%, improve performance by 10–25% while occupying 4–13% more area when compared to the baseline architecture.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"43 1","pages":"219-226"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88906869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-silicon bug diagnosis with inconsistent executions","authors":"A. DeOrio, D. Khudia, V. Bertacco","doi":"10.1109/ICCAD.2011.6105414","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105414","url":null,"abstract":"The complexity of modern chips intensifies verification challenges, and an increasing share of this verification effort is shouldered by post-silicon validation. Focusing on the first silicon prototypes, post-silicon validation poses critical new challenges such as intermittent failures, where multiple executions of a same test do not yield a consistent outcome. These are often due to on-chip asynchronous events and electrical effects, leading to extremely time-consuming, if not unachievable, bug diagnosis and debugging processes. In this work, we propose a methodology called BPS (Bug Positioning System) to support the automatic diagnosis of these difficult bugs. During post-silicon validation, lightweight BPS hardware logs a compact encoding of observed signal activity over multiple executions of the same test: some passing, some failing. Leveraging a novel post-analysis algorithm, BPS uses the logged activity to diagnose the bug, identifying the approximate manifestation time and critical design signals. We found experimentally that BPS can localize most bugs down to the exact root signal and within about 1,000 clock cycles of their occurrence.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"159 1","pages":"755-761"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77028935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal layout decomposition for double patterning technology","authors":"Xiaoping Tang, Minsik Cho","doi":"10.1109/ICCAD.2011.6105298","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105298","url":null,"abstract":"Double patterning technology (DPT) is regarded as the most practical solution for the sub-22nm lithography technology. DPT decomposes a single layout into two masks and applies double exposure to print the shapes in the layout. DPT requires accurate overlay control. Thus, the primary objective in DPT decomposition is to minimize the number of stitches (overlay) between the shapes in the two masks. The problem of minimizing the number of stitches in DPT decomposition is conjectured to be NP-hard. Existing approaches either apply Integer Linear Programming (ILP) or use heuristics. In this paper, we show that the problem is actually in P and present a method to decompose a layout for DPT and minimize the number of stitches optimally. The complexity of the method is O(n1.5 log n). Experimental results show that the method is even faster than the fast heuristics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"57 1","pages":"9-13"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75461638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}