{"title":"Design Automation and Analysis of Resonant Rotary Clocking Technology","authors":"V. Honkote","doi":"10.1109/ISVLSI.2010.28","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.28","url":null,"abstract":"Resonant rotary clocking is a next generation clocking technology for ultra-low power, multi-GHz range operation. Previous works demonstrate the feasibility of this technology with full-custom, low-complexity circuit implementations. In this work, the rotary operational principles are investigated at a larger scale, and physical design and timing verification methods are developed as a blueprint for a fully-automated, semi-custom implementation.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Generation of Massively Parallel Hardware from Control-Intensive Sequential Programs","authors":"Michael F. Dossis","doi":"10.1109/ISVLSI.2010.40","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.40","url":null,"abstract":"High-level synthesis has been envisaged as a suitable methodology to design and deliver on time, at least large parts of today’s complex IC systems. This paper describes a unified and integrated HLS framework, to automatically produce custom and massively-parallel hardware, including its memory and system interfaces from high-level sequential program code. Using compiler-generators and logic programming techniques, provably-correct hardware compilation flow is achieved. The utilized hardware optimization inference engine is driven by a set of resource constraints, which limit to a certain boundary the number of available hardware operators to function in parallel during each control step. This optimization reduces drastically the number of different control steps (states) of the implemented application. The hardware compilation runs are completed in orders-of-magnitude less time than that which would be needed by even very experienced HDL designers to implement the same applications in RTL code. Implementation results from synthesis of a number of control-dominated, linear and repetitive, applications including a MPEG video compression engine with up to a few hundred states, are presented. In all cases the HLS framework delivers quickly provably-correct, implementable RTL code and the optimized schedule is reduced at up to 30% in comparison with the initial schedule.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115183232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Leshner, Krzysztof S. Berezowski, Xiaoyin Yao, Gayathri Chalivendra, Saurabh Patel, S. Vrudhula
{"title":"A Low Power, High Performance Threshold Logic-Based Standard Cell Multiplier in 65 nm CMOS","authors":"S. Leshner, Krzysztof S. Berezowski, Xiaoyin Yao, Gayathri Chalivendra, Saurabh Patel, S. Vrudhula","doi":"10.1109/ISVLSI.2010.32","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.32","url":null,"abstract":"In this paper we describe the design, simulation, fabrication, and test of a 32-bit 2's complement integer multiplier constructed from a combination of CMOS standard cells and threshold logic elements in a 65 nm low power process. As compared to a multiplier designed solely using CMOS standard cells, the threshold logic based multiplier is 1.23x smaller and consumes 1.41x less dynamic power and 2.5x less leakage power at the same process corner.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124412121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Silvano, W. Fornaciari, G. Palermo, V. Zaccaria, F. Castro, Marcos Martínez, S. Bocchio, R. Zafalon, P. Avasare, G. Vanmeerbeeck, C. Ykman-Couvreur, M. Wouters, C. Kavka, L. Onesti, A. Turco, U. Bondi, Giovanni Mariani, H. Posadas, E. Villar, Chris Wu, Dongrui Fan, Hao Zhang, Shibin Tang
{"title":"MULTICUBE: Multi-objective Design Space Exploration of Multi-core Architectures","authors":"C. Silvano, W. Fornaciari, G. Palermo, V. Zaccaria, F. Castro, Marcos Martínez, S. Bocchio, R. Zafalon, P. Avasare, G. Vanmeerbeeck, C. Ykman-Couvreur, M. Wouters, C. Kavka, L. Onesti, A. Turco, U. Bondi, Giovanni Mariani, H. Posadas, E. Villar, Chris Wu, Dongrui Fan, Hao Zhang, Shibin Tang","doi":"10.1007/978-94-007-1488-5_4","DOIUrl":"https://doi.org/10.1007/978-94-007-1488-5_4","url":null,"abstract":"","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123553215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Power Single Electron Or/Nor Gate Operating at 10GHz","authors":"T. Tsiolakis, G. Alexiou, Nikos Konofaos","doi":"10.1109/ISVLSI.2010.78","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.78","url":null,"abstract":"The design and simulation of a single-electron OR/NOR gate is being presented using a Monte Carlo based tool. Both the OR/NOR behavior and the stability were verified while the free energy behavior of the circuit was also examined. The results confirmed that the circuit behaved as an OR/NOR gate, depicting improved characteristics than previously published single electron OR circuits, achieving a really fast operational speed at low power. Moreover, the noise through the circuit was nearly diminished, while a stable behavior of the circuit was verified without any noise present at the output points.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Yield in Nanotechnology Circuits Using Non-square Meshes","authors":"C. Argyrides, Nikolaos Mavrogiannakis, D. Pradhan","doi":"10.1109/ISVLSI.2010.113","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.113","url":null,"abstract":"Nanotechnology based fabrication, which relies on self-assembly of nanotubes or nanowires has been predicted to be an alternative to silicon technology since lithography based IC is approaching its limit in terms of feature size. However, such processes are expected to have high defect density and have be handled with effective defect tolerant techniques. In this paper, we propose a technique, which for a given circuit size, utilizes different combinations of defect-free non-square but rectangular crossbars to construct the desired circuit with improved yield. We extend our recently proposed algorithm[1] to cope with non-square meshes. We aim to improve the number of defect-free crossbars and also to improve the total yield by connecting defect-free non-square but rectangular subsets together. We also estimate the reliability of the resulting circuits and observed that while the yield increases significantly in our architecture, the reliability, however, decreases due to the increased number of interconnects. Finally, we provide a guideline to optimize the architecture making an optimal trade off between the yield and the reliability.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129708566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LE1: A Parameterizable VLIW Chip-Multiprocessor with Hardware PThreads Support","authors":"D. Stevens, V. Chouliaras","doi":"10.1109/ISVLSI.2010.107","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.107","url":null,"abstract":"We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128643088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical DFT with Combinational Scan Compression, Partition Chain and RPCT","authors":"P. Srinivasan, R. Farrell","doi":"10.1109/ISVLSI.2010.59","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.59","url":null,"abstract":"Modular and hierarchical based test architecture are the two of the most common testing techniques used in complex SoC designs. However, modular test architectures uses an expensive (in terms of silicon area) test wrapper around each block. On the other hand hierarchical test architecture requires additional effort at block level to isolate each block from surrounding blocks and a TAM to perform scan compression. In this paper, we analyze the limitations of the modular test architecture. Based on the analysis, we propose a test plan for hierarchical test architecture by integrating partition chain, combinational scan compression and (RPCT) reduced pin count test. Experimental results show that approximately 50% of DFT area can be reduced using the partition chain as compared to standard test wrapper. It also demonstrates the feasibility of the proposed test plan using a commercial ATPG tool.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129065077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clock Tree Synthesis with XOR Gates for Polarity Assignment","authors":"Jianchao Lu, B. Taskin","doi":"10.1109/ISVLSI.2010.62","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.62","url":null,"abstract":"A novel clock tree synthesis (CTS) method is proposed that improves the reliability of an integrated circuit system through reducing the peak current on the power/ground rails drawn by the clock tree buffers. The proposed CTS method entails the integration of XOR gates at one level of the clock tree to enable polarity assignment for peak current reduction. Unlike previous polarity assignment methods, the skew of the generated clock tree with XORs is preserved as the physical layout of the clock tree is preserved during the polarity assignment process. Furthermore, the proposed clock tree permits the implementation of most of the previous polarity assignment methods through configurability of the control input of the XOR gates. Experimental results show that the peak current on the power/ground rails of the clock tree is reduced by an average of 55.2% without any degradation in the original clock skew.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122363349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ASIC Design of an Adaptive Control Unit for Reconfigurable Analog-to-Digital Converters","authors":"Z. Razak, A. Erdogan, T. Arslan","doi":"10.1109/ISVLSI.2010.79","DOIUrl":"https://doi.org/10.1109/ISVLSI.2010.79","url":null,"abstract":"There is a need to use a truly adaptive analog-to-digital converter (ADC) to respond to any signal change and reduce the power consumption with less implementation complexity. The paper presents a front-end ASIC implementation for an adaptive control unit (ACU) for a reconfigurable ADC. The control unit is based on an adaptive algorithm that changes either the converter resolution or sampling-rate within an observation interval. Switching activity on the digital ADC output is monitored, evaluated and compared to threshold values. The resolution (or sampling-rate) is increased when the switching activity is high and decreased when the activity is low. Since the adaptive control unit is simple, it is suitable for most Nyquist-rate ADCs especially for area-limited portable devices. The module is synthesized using AMS 0.35μm/3.3V CMOS standard libraries. In adaptive resolution ADC application, the ACU occupies only 677 equivalent 2-input NAND gates and consumes only 1.01mW. Meanwhile, for adaptive sampling-rate ADC, the gate density is 703 and power consumption is 2.22mW. The results show that the area complexity of the ACU is small and consumes minimum power. For this reason, the ACU is suitable for adaptive ADC implementation targeting low power wireless applications.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116129201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}