U. Heo, Xueqing Li, Huichu Liu, S. Gupta, S. Datta, N. Vijaykrishnan
{"title":"A High-Efficiency Switched-Capacitance HTFET Charge Pump for Low-Input-Voltage Applications","authors":"U. Heo, Xueqing Li, Huichu Liu, S. Gupta, S. Datta, N. Vijaykrishnan","doi":"10.1109/VLSID.2015.58","DOIUrl":"https://doi.org/10.1109/VLSID.2015.58","url":null,"abstract":"This paper presents a high-efficiency switched-capacitance charge pump in 20 nm III-V heterojunction tunnel field-effect transistor (HTFET) technology for low-input-voltage applications. The steep-slope and low-threshold HTFET device characteristics are utilized to extend the input voltage range to below 0.20 V. Meanwhile, the uni-directional current conduction is utilized to reduce the reverse energy loss and to simplify the non-overlapping phase controlling. Furthermore, with unidirectional current conduction, an improved cross-coupled charge pump topology is proposed for higher voltage output and PCE. Simulation results show that the proposed HTFET charge pump with a 1.0 kΩ resistive load achieves 90.4% and 91.4% power conversion efficiency, and 0.37 V and 0.57 V DC output voltage, when the input voltage is 0.20 V and 0.30 V, respectively.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122024708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of High Speed Ternary Full Adder and Three-Input XOR Circuits Using CNTFETs","authors":"Sneh Lata Murotiya, Anu Gupta","doi":"10.1109/VLSID.2015.56","DOIUrl":"https://doi.org/10.1109/VLSID.2015.56","url":null,"abstract":"This paper proposes a new high speed ternary full adder (TFA) cell for carbon nano tube field effect transistor (CNTFET) technology. The proposed design has a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors. The design takes inputs through a decoding unit and uses ternary nature of A & B but inherent binary nature of Cin leading to simplicity in design. The design demonstrates high driving power and robustness in terms of insusceptibility to voltage and temperature variations. The sum generation unit of proposed design is further modified for achieving an energy efficient three-input ternary XOR circuit which can be used as a basic cell in modern circuit design. Hspice simulation results with 32nm Stanford CNTFET model show 49% reduction in delay with 19% progress in power-delay product (PDP) for the proposed TFA and 43% reduction in delay with 48 % improvement in PDP for the proposed three input ternary XOR circuit in comparison with the CNTFET-based designs, recently published in the literature.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122958789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A CMOS 90nm 50Mhz Supply Noise Tolerant High Density 8T-NAND ROM","authors":"Kedar Janardan Dhori, Vinay Kumar, Ashish Kumar","doi":"10.1109/VLSID.2015.36","DOIUrl":"https://doi.org/10.1109/VLSID.2015.36","url":null,"abstract":"On-chip power grid design is a major challenge in submicron technologies. High peak current coupled with inductive reactance of supply mesh results in power integrity issue results in ringing. This supply noise reduces the available differential voltage for sensing and results in read failure in Read only memory (ROM). Controlling the noise by using large decoupling capacitor is area consuming. Proposed scheme uses a noise tolerant reference generation. Scheme reduces the coupling effect of noise on differential nodes at Sense Amplifier. This is done by decoupling the differential nodes from power supply noise using highly capacitive shared reference lines. Thus, the impact of supply noise on differential voltage is reduced by ~90%. Scheme results in improvement in speed and power by 20% and 5% respectively with no area loss. We achieved 50MHz operating frequency with 8T-NAND High VT (HVT) ROM for 8192×128 (i.e. 8K words and 128 bits) instance.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded Tutorial ET1: Better-than-Worst-Case Timing Designs","authors":"A. Singh","doi":"10.1109/VLSID.2015.118","DOIUrl":"https://doi.org/10.1109/VLSID.2015.118","url":null,"abstract":"Achieving high performance within stringent power budgets is emerging to be one of the most difficult challenges in the design of current generation digital systems. In synchronous systems, switching signals are typically allowed a fixed amount of time to settle within each clock cycle, with the clock period appropriately selected to accommodate the worst-case switching delay. Some additional timing margin, typically 10-20% of the clock period, is allowed beyond the nominal critical path delays to accommodate timing uncertainties introduced by process, voltage and temperature (PVT) variations; these appear to be increasing significantly in highly scaled CMOS technologies. Unfortunately, despite the lack of switching activity, the circuit continues to consume significant static power during these timing margins, which consequently result in unwanted loss of both power and performance. Furthermore, since worst case signal paths in CMOS are highly input dependent and generally not activated in every clock cycle, this wasteful window of circuit inactivity in a typical cycle is often longer than just the timing margin. This is particularly true for circuits with a wide distribution of path delays, where the few long paths are infrequently activated; the computation completes with signals stabilizing quite early in most clock cycles. Clearly, significantly higher computational throughput and power efficiency could be achieved if the resulting window of circuit inactivity during the remainder of the clock cycles could be eliminated or even minimized. Asynchronous and data flow designs and architectures have long tried to exploit this statistical variability in delays in circuit functional blocks by building in a capability for signaling the completion of each operation. This can potentially allow execution to proceed as soon as a functional result is available, instead of waiting out the worst case delay for each functional block. An early and classic example is carry completion signaling in ripple carry adders which provides an indication as soon as the carry signals have stabilized and the result is valid, following application of each new set of inputs. Unfortunately, the efficient design of fully asynchronous and data flow systems has proved extremely challenging. Consequently, elements of asynchronous operation have sometimes been incorporated into traditional clock based designs using some form of a handshaking control protocol. Typically such designs dynamically allow functional units a varying number of system clock periods to complete their operation, thereby avoiding worst case delays in every instance. The mechanisms employed to ensure that a functional block gets sufficient time to correctly complete its operation broadly take three forms. (1) Completion signaling, where the function is designed with redundant outputs (or output coding) which directly indicates when the result is valid. (2) Input based timing prediction, where (a subset of) ","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131210396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial T4: MEMS: Design, Fabrication, and their Applications as Chemical and Biosensors","authors":"N. Kale","doi":"10.1109/VLSID.2015.112","DOIUrl":"https://doi.org/10.1109/VLSID.2015.112","url":null,"abstract":"The microfabrication technology has had a chequered history of over 50 years in the field of microelectronics. Aggressive miniaturization of microelectronic devices has resulted in faster logic circuits and it has also reduced their power requirements. MOSFET device dimensions have already entered the sub-100 nanometer regime. The same principles of microfabrication were applied to make miniaturized 3-dimensional mechanical structures. This helped in the advent of micro electro-mechanical systems or MEMS. Initially, i.e. in early nineties, the MEMS field was dominated by mechanical applications. However, now MEMS refers to all miniaturized systems including silicon based mechanical drivers, chemical and biological sensors and actuators, and miniature devices made from plastics or ceramics. The half-day tutorial would begin with a synoptic overview of the area, highlight some of the challenges and outline the scope of the tutorial. It would be followed with an introduction to the design of microsensors, such as the pressure sensor and the accelerometer that began the MEMS revolution. Micromachined Electro-Mechanical Systems (MEMS), also called Microfabricated Systems (MS), have evoked great interest in the scientific and engineering communities. This is primarily due to several substantive advantages that MEMS offer: orders of magnitude smaller size, better performance than other solutions, possibilities for batch fabrication and cost-effective integration with electronics, virtually zero dc power consumption and potentially large reduction in power consumption, etc. The application domains cover microsensors and actuators for physical quantities (MEMS), of which MEMS for automobile & consumer electronics forms a large segment; microfabricated subsystems for communications and computer systems (RF-MEMS & MOMS); and microfabricated systems for chemical assay (microTAS) and for biochemical and biomedical assay (bioMEMS and DNA chips). This tutorial would give an introduction to these exciting developments and the technology and design approaches for the realization of these integrated systems. We will also introduce the importance of material selection by understanding the impact of material properties, even at the micron scale. We will discuss polymeric materials such as SU-8 and also compare them with traditional materials such as Silicon. We will also discuss about the possibility of integrating MEMS with VLSI electronics. Simulators provide an excellent way to design, optimize and understand micromechanical systems. Particularly so because such systems are not of isolated, stand alone type; instead, they are based on the interplay of several domains. For example, in a microcantilever based biosensing system the different domains are: materials, mechanical, biological, electrical and chemical. Recently developed software packages such as Coventorware, Intellisuite etc. have the ability to simulate a system in different domains. One can, for examp","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122901530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. S. M. Siddiqui, S. Sharad, Yogendra Sharma, A. Khanuja
{"title":"Two Phase Write Scheme to Improve Low Voltage Write-ability in Medium-Density SRAMs","authors":"M. S. M. Siddiqui, S. Sharad, Yogendra Sharma, A. Khanuja","doi":"10.1109/VLSID.2015.35","DOIUrl":"https://doi.org/10.1109/VLSID.2015.35","url":null,"abstract":"State-of-art SRAM designs use either the negative bit line or the overdrive word line write assist circuits to improve the write-ability in a low voltage VDDMIN environment. But at the higher voltage operations, these write assist circuits will have an adverse effect on the SRAM bit cell's pass gate oxide tox reliability like hot carrier injection and time-dependent dielectric breakdown (TDDB). In this paper, we propose a novel two phase write scheme to improve the write-ability in a VDDMIN environment. We achieved improved write-ability by driving the word line voltage level to the power supply rail, in conjunction with the medium-sized SRAM bit cell. Simulation results at VDDMIN voltage of 0.52V in 16nm TSMC FinFET technology, demonstrate that the worst 5σ bit cell write margin is improved by 85mV. Our two phase write scheme with the word line voltage level restricted to the power supply rail, does not risk the bit cell's pass gate tox reliability at the higher voltage operations. We also present the two phase write scheme macro implementation for a column multiplexed SRAM architecture.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121596993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suman Chatterjee, Vikram Singh Saun, A. Arunachalam
{"title":"A Methodology for Placement of Regular and Structured Circuits","authors":"Suman Chatterjee, Vikram Singh Saun, A. Arunachalam","doi":"10.1109/VLSID.2015.90","DOIUrl":"https://doi.org/10.1109/VLSID.2015.90","url":null,"abstract":"Data path circuits are regular and best placed in bit-sliced pattern for improved Quality of Results such as timing, power, congestion and area. The cells in a column of bit slice structure are normally aligned on control pins or clock pins for straight routes, reducing power. The traditional way of placing data path circuits, using separate data path placer and then bringing them as macro in main design has its significant disadvantages. It is important for modern day placement tool to place random logic and data path circuits concurrently respecting the regularity that a data path circuit has by placing them in bit-sliced manner. It is not only important to place the data path elements in bit-sliced pattern but also that structure has to be maintained throughout the flow. The different set of optimization tricks can be applied to different bits of data path which can destroy the identical footprints of cells in column and that brings challenge of maintaining pin alignment. In addition to that, in lower nanometer nodes, fixed physical only cells pre-placed throughout the core area pose challenge of keeping bit-sliced structure intact. A flow for handling data path circuits in a place and route tool along with an algorithm for bit slice tiling is being proposed in this paper which addresses the challenges mentioned above.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"131 4-5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121941862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eleonora Schönborn, K. Datta, R. Wille, I. Sengupta, H. Rahaman, R. Drechsler
{"title":"BDD-Based Synthesis for All-Optical Mach-Zehnder Interferometer Circuits","authors":"Eleonora Schönborn, K. Datta, R. Wille, I. Sengupta, H. Rahaman, R. Drechsler","doi":"10.1109/VLSID.2015.79","DOIUrl":"https://doi.org/10.1109/VLSID.2015.79","url":null,"abstract":"With the advancements in fabrication technology and the emergence of very high performance systems in VLSI, the interest for optical interconnects and optical functional on-chip units increased significantly. Mach-Zehnder Interferometer (MZI) switches based on Semiconductor Optical Amplifiers (SOAs) have been used as optical building blocks and allowed the synthesis of important Boolean functions such as multiplexers or adders. However, no automatic synthesis approach for arbitrary Boolean functions has been proposed yet. In this work, we introduce such a scheme. For this purpose, we make use of Binary Decision Diagrams (BDDs). A technology library is proposed where all possible BDD node configurations are identified and associated with corresponding all-optical sub-circuits. This library is utilized in order to map a BDD representing an arbitrary function into an all-optical circuit using a linear-time algorithm. Experimental evaluations confirm that this leads to an efficient realization of the considered functions.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130064092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Transition Detector Exploiting Charge Sharing","authors":"Yu Wang, A. Singh","doi":"10.1109/VLSID.2015.57","DOIUrl":"https://doi.org/10.1109/VLSID.2015.57","url":null,"abstract":"Transition detectors have been widely employed for online error and metastability detection, including in Better-Than-Worst-Case (BTWC) timing design of microprocessors that are designed to allow occasional timing errors. In such applications, the area overhead introduced by the transition detectors is a major concern because they may need to be incorporated in almost all the flip-flops in a large design. In this paper, we present a new transition detector (TDCS) which employs a novel combination of charge sharing effects and short circuit based discharge for operation to reduce more than half the circuit complexity of traditional designs. Simulation of our TDCS design shows that it can reliably achieve the same functionality as published designs with 60% fewer transistors. Furthermore, detailed corner analysis shows that our TDCS design is robust under extreme PVT variations.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129179486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Pandey, Arindam Karmakar, C. Shekhar, S. Gurunarayanan
{"title":"An FPGA-Based Architecture for Local Similarity Measure for Image/Video Processing Applications","authors":"J. Pandey, Arindam Karmakar, C. Shekhar, S. Gurunarayanan","doi":"10.1109/VLSID.2015.63","DOIUrl":"https://doi.org/10.1109/VLSID.2015.63","url":null,"abstract":"Similarity measures are used in diverse signal-processing applications. Bhattacharyya coefficient is one of the most popular similarity measures that is widely used in many image/video processing applications. Several of these applications need to compute similarity measure between probability density functions of local image statistics. In this paper, an efficient hardware architecture is proposed for accelerating the local similarity measure (LSM) computation using Bhattacharyya coefficient. Direct hardware implementation of Bhattacharyya coefficient requires many compute-intensive hardware resources, which slow down the overall computation process. Data path of the proposed architecture utilizes fixed-point arithmetic and is based on the logarithmic number system. Fast binary logarithmic and antilogarithmic computing units are deployed to realize the required complex arithmetic operations. The histogram computation is accomplished using single-cycle read-modify-write operations on the received image data stored in DDR2 SDRAM. The proposed architecture is realized in the Virtex-5 xc5vfx70t FPGA device of Xilinx ML-507 platform. The device utilization of the implemented architecture shows that it utilizes 4.5% FPGA slices, 5.4% Block RAMs and 27.34% DSP48E slices.","PeriodicalId":123635,"journal":{"name":"2015 28th International Conference on VLSI Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117051521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}