{"title":"Dynamic Aggregation of Virtual Addresses in TLB Using TCAM Cells","authors":"Rupak Samanta, Jason Surprise, R. Mahapatra","doi":"10.1109/VLSI.2008.57","DOIUrl":"https://doi.org/10.1109/VLSI.2008.57","url":null,"abstract":"In this paper, we propose dynamic aggregation of virtual tags in the Translation Lookaside Buffer (TLB) to increase its storage capacity without increasing the size of the tag array. To support dynamic aggregation, we incorporate a few Ternary-CAM (TCAM) cells into the TLB tag array. The modified TLB architecture demonstrates a compression scheme that increases TLB reach with negligible overhead and no access time penalty. The performance of the proposed TLB architecture is evaluated using SPEC CPU2000 benchmarks. Simulation results indicate a significant reduction in miss ratios, nearly 100% reduction is achieved in several benchmarks, and as much as a 46% increase in IPC (Instructions per cycle) is obtained when compared to a conventional TLB with the same number of tag entries. We also evaluate the performance of our tag compressed TLB against the performance of a conventional TLB that contains an equivalent number of virtual to physical address translations. Our results show that TCAM based compression is able to achieve nearly the same system performance as the large conventional TLB while consuming on average 38% less energy and 42% less area; thus illustrating that tag compression is a more attractive solution for improving TLB performance than simply increasing the size of the TLB.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133940502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amith Singhee, Jiajing Wang, B. Calhoun, Rob A. Rutenbar
{"title":"Recursive Statistical Blockade: An Enhanced Technique for Rare Event Simulation with Application to SRAM Circuit Design","authors":"Amith Singhee, Jiajing Wang, B. Calhoun, Rob A. Rutenbar","doi":"10.1109/VLSI.2008.54","DOIUrl":"https://doi.org/10.1109/VLSI.2008.54","url":null,"abstract":"Circuit reliability under statistical process variation is an area of growing concern. For highly replicated circuits such as SRAMs and flip flops, a rare statistical event for one circuit may induce a not-so-rare system failure. The Statistical Blockade was proposed as a Monte Carlo technique that allows us to efficiently filter-to block-unwanted samples insufficiently rare in the tail distributions we seek. However, there are significant practical problems with the technique. In this work, we show common scenarios in SRAM design where these problems render Statistical Blockade ineffective. We then propose significant extensions to make Statistical Blockade practically usable in these common scenarios. We show speedups of 102+ over standard Statistical Blockade and 104+ over standard Monte Carlo, for an SRAM cell in an industrial 90 nm technology.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Edge Clock (SEC) Distribution for Improved Latency, Skew, and Jitter Performance","authors":"Jeff Mueller, R. Saleh","doi":"10.1109/VLSI.2008.36","DOIUrl":"https://doi.org/10.1109/VLSI.2008.36","url":null,"abstract":"Synchronous clock distribution continues to be the dominant timing methodology for VLSI designs. As processes shrink, clock speeds increase, and die sizes grow, more-and-more of the clock period is lost to skew and jitter budgets. We propose to improve clock performance by focusing on the single, critical clock edge while relaxing requirements of the non-critical edge. A novel re-design of the traditional clock buffer is proposed as a drop-in replacement for existing clock distribution networks, yielding timing performance improvements of over 20% in latency and skew and up to 30% in jitter; alternatively, these timing advantages could be traded off to reduce clock buffer area and power by 33% and 12%, respectively.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Delay Variation in Encoded On-Chip Bus Signaling under Process Variation","authors":"S. Tuuna, J. Isoaho, H. Tenhunen","doi":"10.1109/VLSI.2008.73","DOIUrl":"https://doi.org/10.1109/VLSI.2008.73","url":null,"abstract":"In this paper, we model on-chip signaling over a bus consisting of encoding, drivers, transmission lines, receivers and decoding. We characterize the signaling circuitry as a function of its load capacitance. The effective load capacitance seen by a driver is derived for the decoupling method and distributed RLC transmission line models. The driver delay and rise time corresponding to the derived effective capacitance are used to derive the far-end voltage of a transmission line bus. The effects of process variation are taken into account in the characterization of the signaling circuitry and in the wire analysis. The overall delay variation of the bus due to device and wire process variation is then calculated. The model is verified by comparing it to HSPICE. We implement regular voltage mode, level- encoded dual-rail and l-of-4 signaling circuitry and apply the derived model to analyze them. The implementation and analysis are done in 45 nm technology.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114812151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing the Best Processor Cores","authors":"V. Boppana, R. Varma, S. Balajee","doi":"10.1109/VLSI.2008.137","DOIUrl":"https://doi.org/10.1109/VLSI.2008.137","url":null,"abstract":"Summary form only given. It is well-known that varying architectural, technological and implementation aspects of embedded microprocessors, such as ARM, can produce widely differing performance and power specifications. Frequency specifications of high-end realizations are often nearly 2x-3x over vanilla flows. Power optimization techniques used in high-end processor designs have also been reported to have the potential to produce 3x-10x improvements in power over standard flows. This tutorial reviews high-end processor design challenges, techniques and presents state-of-the-art flows for implementing embedded processors. These techniques include processor and architecture selection, verification, selection of technology node/process, selection of macros, selection and optimization of standard cell libraries, design/architecture and power planning, advanced timing and power optimization, design closure, design integration, variability-tolerance, and design-for-manufacturability. The tutorial arms the audience with the best techniques, tools and methodologies to select and achieve the best Silicon for state-of-the-art embedded processors.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123870533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Detection of Missing-Gate Faults in Reversible Circuits by a Universal Test Set","authors":"H. Rahaman, D. Kole, D. K. Das, B. Bhattacharya","doi":"10.1109/VLSI.2008.106","DOIUrl":"https://doi.org/10.1109/VLSI.2008.106","url":null,"abstract":"Logic synthesis with reversible circuits has received considerable interest in the light of advances recently made in quantum computation. Implementation of a reversible circuit is envisaged by deploying several special types of quantum gates, such as k-CNOT. Newer technologies like ion trapping or nuclear magnetic resonance are required to emulate quantum gates. Although the classical stuck-at fault model is widely used for testing conventional CMOS circuits, new fault models, namely, single missing-gate fault (SMGF), repeated-gate fault (RGF), partial missing-gate fault (PMGF), and multiple missing-gate fault (MMGF), have been found to be more suitable for modeling defects in quantum k-CNOT gates. In this paper, it is shown that in an (n times n) reversible circuit implemented with k-CNOT gates, addition of only one extra control line along with duplication each k-CNOT gate yields an easily testable design, which admits a universal test set of size (n +1) that detects all SMGFs, RGFs, and PMGFs in the circuit.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125652987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Mathew, H. Rahaman, Ashutosh Kumar Singh, A. Jabir, D. Pradhan
{"title":"A Galois Field Based Logic Synthesis Approach with Testability","authors":"J. Mathew, H. Rahaman, Ashutosh Kumar Singh, A. Jabir, D. Pradhan","doi":"10.1109/VLSI.2008.88","DOIUrl":"https://doi.org/10.1109/VLSI.2008.88","url":null,"abstract":"In deep-submicron VLSI, efficient circuit testability is one of the most demanding requirements. Efficient testable logic synthesis is one way to tackle the problem. To this end, this paper introduces a new fast efficient graph-based decomposition technique for Boolean functions in finite fields, which utilizes the data structure of the multiple-output decision diagrams (MODD). In particular, the proposed technique is based on finite fields and can decompose any N valued arbitrary function F into N distinct sets conjunctively and N-l distinct sets disjunctively. The proposed technique is capable of generating testable circuits. The experimental results show that the proposed method is more economical in terms of literal count compared to existing approaches. Furthermore, we have shown that the basic block can be tested with eight test vectors.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130044984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Event Upset: An Embedded Tutorial","authors":"Fan Wang, V. Agrawal","doi":"10.1109/VLSI.2008.28","DOIUrl":"https://doi.org/10.1109/VLSI.2008.28","url":null,"abstract":"With the continuous downscaling of CMOS technologies, the reliability has become a major bottleneck in the evolution of the next generation systems. Technology trends such as transistor down-sizing, use of new materials, and system on chip architectures continue to increase the sensitivity of systems to soft errors. These errors are random and not related to permanent hardware faults. Their causes may be internal (e.g., interconnect coupling) or external (e.g., cosmic radiation). To meet the system reliability requirements it is necessary for both the circuit designers and test engineers to get the basic knowledge of the soft errors. We present a tutorial study of the radiation-induced single event upset phenomenon caused by external radiation, which is a major source of soft errors. We summarize basic radiation mechanisms and the resulting soft errors in silicon. Soft error mitigation techniques with time and space redundancy are illustrated. An industrial design example, the IBM z990 system, shows how the industry is dealing with soft errors these days.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127269939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wiring-Area Efficient Simultaneous Bidirectional Point-to-Point Link for Inter-Block On-Chip Signaling","authors":"Charbel J. Akl, M. Bayoumi","doi":"10.1109/VLSI.2008.23","DOIUrl":"https://doi.org/10.1109/VLSI.2008.23","url":null,"abstract":"The continuous semiconductor technology scaling has made on-chip interconnect the major determinant of VLSI design cost and complexity. This necessitates the usage of signaling techniques that reduce the number of long on- chip wires and repeaters. In this paper, we present a point-to-point inter-block on-chip link design that allows simultaneous bidirectional signaling, thus reducing the number of signal lines and repeaters, while achieving high performance. By using accelerating repeaters and inserting a bidirectional latch at the midpoint of the link high performance simultaneous bidirectional signaling can be achieved with significant reduction in repeater and wire counts. We analyze the switching behavior of the proposed on-chip simultaneous bidirectional link (SBL) and find that it suffers from large switching activity overhead. Therefore, an opposite-polarity transition encoding is also proposed to reduce the power overhead of SBL without affecting its performance.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DFM / DFT / SiliconDebug / Diagnosis","authors":"S. Venkataraman, Nagesh Tamarapalli","doi":"10.1109/VLSI.2008.129","DOIUrl":"https://doi.org/10.1109/VLSI.2008.129","url":null,"abstract":"Semiconductor yield has traditionally been limited by random particle-defect based issues. However, as the feature sizes reduced to 0.13 micron and below, systematic mechanism-limited yield loss began to appear as a substantial component in yield loss. In addition, it is becoming clear that ramping yield would take longer and final yields would not reach historical norms. A key factor for not reaching previously attained yield levels is the interaction between design and manufacturing. Yield losses in the newer processes include functional defects, parametric defects and issues with testing. Each of these sources of yield loss needs to analyzed and understood by designers and tool developers. In addition, new techniques and methods must be devised to minimize the impact of these yield loss mechanisms. After an introduction of the issues involved in the first section, the second section covers Design-for-Manufacturing (DFM) techniques to analyze the design content, flag areas of design that could limit yield, and make changes to improve yield. However, once the changes are made it is necessary to quantify their impact so that knowledge about yield contribution of different features can be fed back to design and DFM tools. Test presents an opportunity to close the loop by crafting test patterns to expose the defect prone features during automatic test pattern generation (ATPG) and by analyzing silicon failures through diagnosis to determine the features that are actually causing yield loss and their relative impact. The third section covers design techniques (DFX) to improve testability, debuggability and diagnosability, and DFM and defect aware test generation to both meet product quality and expose yield issues at test. Section four covers the basic concepts and theoretical aspects of debug and diagnosis including algorithmic IC diagnosis, scan chain diagnosis, critical path based techniques and diagnosis of delay defects. The applications of the basic concepts and techniques for silicon debug are covered in section five. Section six covers the application of statistical diagnosis techniques to determine the features that are actually causing yield loss and their relative impact. Finally, in section seven, future trends, challenges and directions are covered.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130496186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}