S. Fujita, S. Yasuda, Daesung Lee, Xiangyu Chen, D. Akinwande, H. Wong
{"title":"Detachable nano-carbon chip with ultra low power","authors":"S. Fujita, S. Yasuda, Daesung Lee, Xiangyu Chen, D. Akinwande, H. Wong","doi":"10.1145/1837274.1837434","DOIUrl":"https://doi.org/10.1145/1837274.1837434","url":null,"abstract":"This paper describes ultra-low-power chip design using nano-scale electro-mechanical switches (NEMS) with graphene. This chip is attachable and detachable onto the top of other chips due to remarkable stickiness of carbon-nanotube interconnects. New 3D-IC can be thus constructed for reconfigurable system-on-chips. Furthermore, due to a floating gate built in NEMS, their logic performance is much superior to that of NEMS-based logic in previous works, and even better than that of conventional CMOS.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"6 1","pages":"631-632"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81664470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ozdemir, Yan Pan, Abhishek Das, G. Memik, G. Loh, A. Choudhary
{"title":"Quantifying and coping with parametric variations in 3D-stacked microarchitectures","authors":"S. Ozdemir, Yan Pan, Abhishek Das, G. Memik, G. Loh, A. Choudhary","doi":"10.1145/1837274.1837312","DOIUrl":"https://doi.org/10.1145/1837274.1837312","url":null,"abstract":"Variability in device characteristics, i.e., parametric variations, is an important problem for shrinking process technologies. They manifest themselves as variations in performance, power consumption, and reduction in reliability in the manufactured chips as well as low yield levels. Their implications on performance and yield are particularly profound on 3D architectures: a defect on even a single layer can render the entire stack useless. In this paper, we show that instead of causing increased yield losses, we can actually exploit 3D technology to reduce yield losses by intelligently devising the architectures. We take advantage of the layer-to-layer variations to reduce yield losses by splitting critical components among multiple layers. Our results indicate that our proposed method achieves a 30.6% lower yield loss rate compared to the same pipeline implemented on a 2D architecture.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"3 1","pages":"144-149"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79090246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance yield-driven task allocation and scheduling for MPSoCs under process variation","authors":"Lin Huang, Q. Xu","doi":"10.1145/1837274.1837358","DOIUrl":"https://doi.org/10.1145/1837274.1837358","url":null,"abstract":"With the ever-increasing transistor variability in CMOS technology, it is essential to integrate variation-aware performance analysis into the task allocation and scheduling process to improve its performance yield when building today's multiprocessor system-on-a-chip (MPSoC). Existing solutions assume that the execution times of tasks performed on different processors are statistically independent, which ignores the spatial correlation characteristics for systematic variation. In addition, a unified task schedule is constructed at design stage and applied to all products with various variation effects, which restricts the maximum performance yield that can be achieved for MPSoC products. To tackle the above problems, in this paper, we present a novel quasi-static scheduling algorithm. Based on a more accurate performance yield estimation method, a set of variation-aware schedules is synthesized off-line and, at run time, the scheduler will select the right one based on the actual variation for each chip, such that the timing constraint can be satisfied whenever possible. Experimental results demonstrate the effectiveness.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"280 1","pages":"326-331"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73173385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance-driven analog placement considering boundary constraint","authors":"Cheng-Wu Lin, Jai-Ming Lin, Chun-Po Huang, Soon-Jyh Chang","doi":"10.1145/1837274.1837348","DOIUrl":"https://doi.org/10.1145/1837274.1837348","url":null,"abstract":"To reduce parasitic mismatches in analog design, we usually care about the property of symmetric placement for symmetry groups, which would form several symmetry islands in a chip. However, routing is greatly affected by placement results. If modules with input or output ports are placed arbitrarily in a symmetry island, the routing wires, which connect these modules with other modules outside the island, may induce unwanted parasitics coupling to signals, and thus circuit performance is deteriorated. This phenomenon can not be identified by a cost function, which only considers placement area and total wire length. Therefore, we would like to introduce the necessity of considering boundary constraint for the modules with input or output ports in symmetry islands. Based on ASF-B∗ tree [3], we explore the feasible conditions for 1D and 2D symmetry islands to meet this constraint. Further, a procedure is presented to maintain the feasibility for each ASF-B∗ tree after perturbation. Experimental results show that our approach guarantees the boundary property for the modules with input or output ports in symmetry islands.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"1 1","pages":"292-297"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76165697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multilayer nanophotonic interconnection network for on-chip many-core communications","authors":"Xiang Zhang, A. Louri","doi":"10.1145/1837274.1837314","DOIUrl":"https://doi.org/10.1145/1837274.1837314","url":null,"abstract":"Multi-core chips or chip multiprocessors (CMPs) are becoming the de facto architecture for scaling up performance and taking advantage of the increasing transistor count on the chip within reasonable power consumption levels. The projected increase in the number of cores in future CMPs is putting stringent demands on the design of the on-chip network (or network-on-chip, NOC). Nanophotonic interconnects have recently emerged as a viable alternate technology solution for the design of NOC because of their higher communication bandwidth, much reduced power consumption and wiring simplification. Several photonic NOC approaches have recently been proposed. A common feature of almost all of these approaches is the integration of the entire optical network onto a single silicon waveguide layer. However, keeping the entire network on a single layer has a serious implication for power losses and design complexity due to the large amount of waveguide crossings. In this paper, we propose MPNOC: a multilayer photonic networks-on-chip. MPNOC combines the recent advances in silicon photonics and three-dimensional (3D) stacking technology with architectural innovations in an integrated architecture that provides ample bandwidth, low latency, and energy efficient on-chip communications for future CMPs. Simulation results show MPNOC can achieve 81.92 TFLOP/s peak bandwidth and an energy savings up to 23% compared to other proposed planar photonic NOC architectures.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"56 1","pages":"156-161"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82105222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyu Qi, Jiajing Wang, A. C. Cabe, Stuart N. Wooters, T. Blalock, B. Calhoun, M. Stan
{"title":"SRAM-based NBTI/PBTI sensor system design","authors":"Zhenyu Qi, Jiajing Wang, A. C. Cabe, Stuart N. Wooters, T. Blalock, B. Calhoun, M. Stan","doi":"10.1145/1837274.1837486","DOIUrl":"https://doi.org/10.1145/1837274.1837486","url":null,"abstract":"NBTI has been a major aging mechanism for advanced CMOS technology and PBTI is also looming as a big concern. This work first proposes a compact on-chip sensor design that tracks both NBTI and PBTI for both logic and SRAM circuits. Embedded in an SRAM array the sensor takes the form of a 6T SRAM cell and is at least 30× smaller than previous designs. Extensively reusing the SRAM peripheral circuitry minimizes control logic overhead. Sensing overhead is further amortized as the sensors can be both reconfigured and recycled as functional SRAM cells, potentially increasing SRAM yield when other bit cells fail due to initial process variation or long time aging effects. The paper also proposes a variation-aware sensor system design methodology by quantifying and leveraging the tradeoff between the size and number of sensors and the system sensing precision. Design examples show that a system of 500 sensors can achieve 4mV precision with 98.8% confidence, and a system of 1K sensors designed for 1M SRAM bit cells achieves 2000× area overhead reduction compared to a worst-case based approach.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"45 1","pages":"849-852"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87589352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhangxi Tan, Andrew Waterman, Rimas Avizienis, Yunsup Lee, Henry Cook, D. Patterson, K. Asanović
{"title":"RAMP gold: An FPGA-based architecture simulator for multiprocessors","authors":"Zhangxi Tan, Andrew Waterman, Rimas Avizienis, Yunsup Lee, Henry Cook, D. Patterson, K. Asanović","doi":"10.1145/1837274.1837390","DOIUrl":"https://doi.org/10.1145/1837274.1837390","url":null,"abstract":"We present RAMP Gold, an economical FPGA-based architecture simulator that allows rapid early design-space exploration of manycore systems. The RAMP Gold prototype is a high-throughput, cycle-accurate full-system simulator that runs on a single Xilinx Virtex-5 FPGA board, and which simulates a 64-core shared-memory target machine capable of booting real operating systems. To improve FPGA implementation efficiency, functionality and timing are modeled separately and host multithreading is used in both models. We evaluate the prototype's performance using a modern parallel benchmark suite running on our manycore research operating system, achieving two orders of magnitude speedup compared to a widely-used software-based architecture simulator.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"148 1","pages":"463-468"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86115354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chellappa, Jia Ni, Xiaoyin Yao, N. Hindman, J. Velamala, Min Chen, Yu Cao, L. Clark
{"title":"In-situ characterization and extraction of SRAM variability","authors":"S. Chellappa, Jia Ni, Xiaoyin Yao, N. Hindman, J. Velamala, Min Chen, Yu Cao, L. Clark","doi":"10.1145/1837274.1837454","DOIUrl":"https://doi.org/10.1145/1837274.1837454","url":null,"abstract":"Measurement and extraction of as fabricated SRAM cell variability is essential to process improvement and robust design. This is challenging in practice, due to the complexity in the test procedure and requisite numerical analysis. This work proposes a new single-ended test procedure for SRAM cell write margin measurement. Moreover, an efficient decomposition method is developed to extract transistor threshold voltage (VTH) variations from the measurements, allowing accurate determination of SRAM cell stability. The entire approach is demonstrated in a 90 nm test chip with 32 K cells. The advantages of the proposed method include: (1) a single-ended SRAM test structure with no disturbance to SRAM operations; (2) a convenient test procedure that only requires quasi-static control of external voltages; and (3) a non-iterative method that extracts the VTH variation of each transistor from eight measurements. The new procedure enables accurate predictions of SRAM performance variability. As validated with 90 nm data of write margin and data retention voltage, the prediction error from extracted VTH variations is <; 4% at all corners.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"43 1","pages":"711-716"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87607031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECR: A low complexity generalized error cancellation rewiring scheme","authors":"Xiaoqing Yang, T. Lam, Yu-Liang Wu","doi":"10.1145/1837274.1837400","DOIUrl":"https://doi.org/10.1145/1837274.1837400","url":null,"abstract":"Rewiring is known to be a new class of logic restructuring technique at least equally powerful in flexibility compared to other logic transformation techniques while being wiring-sensitive, a property particularly useful for interconnect based circuit synthesis processes. One of the most mature rewiring techniques is the ATPG-based Redundancy Addition and Removal (RAR) technique which adds a redundant alternative wire to make an originally irredundant target wire become redundant and thus removable. In this paper, we propose a new Error Cancellation based Rewiring scheme (ECR) which can also do non-RAR based rewiring operations with high efficiency. Based on the notion of error cancellation, we analyze and reformulate the rewiring problem and develop a generalized rewiring scheme being able to detect more rewiring cases which are not obtainable by existing schemes while still maintains low runtime complexity. Comparing with the most recent non-RAR rewiring tool IRRA, the total number of alternative wires found by our approach is about twice while CPU time is just slightly more (26%) upon benchmarks pre-optimized by rewriting of ABC.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"1 1","pages":"511-516"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91322218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the number of lines in reversible circuits","authors":"R. Wille, Mathias Soeken, R. Drechsler","doi":"10.1145/1837274.1837439","DOIUrl":"https://doi.org/10.1145/1837274.1837439","url":null,"abstract":"Reversible logic became a promising alternative to traditional circuits because of its applications e.g. in low-power design and quantum computation. As a result, design of reversible circuits attracted great attention in the last years. The number of circuit lines is thereby a major criterion since it e.g. affects the still limited resource of qubits. Nevertheless, all approaches introduced so far for synthesis of complex reversible circuits need a significant amount of additional circuit lines - sometimes orders of magnitude more than the primary inputs. In this paper, we propose a post-process optimization method that addresses this problem. The general idea is to merge garbage output lines with appropriate constant input lines. To this end, parts of the circuits are re-synthesized. Experimental results show that by applying the proposed approach, the number of circuit lines can be reduced by 17% on average - in the best case by more than 40%. At the same time, the increase in the number of gates and the quantum costs, respectively, can be kept small.","PeriodicalId":87346,"journal":{"name":"Proceedings. Design Automation Conference","volume":"19 1","pages":"647-652"},"PeriodicalIF":0.0,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91345181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}