Andre Luis Rodeghiero Rosa, L. Soares, Kleber Stangherlin, S. Bampi
{"title":"Designing CMOS for near-threshold minimum-energy operation and extremely wide V-F scaling","authors":"Andre Luis Rodeghiero Rosa, L. Soares, Kleber Stangherlin, S. Bampi","doi":"10.1145/2800986.2801004","DOIUrl":"https://doi.org/10.1145/2800986.2801004","url":null,"abstract":"This work proposes a strategy for designing VLSI circuits to operate in an extremely wide Voltage-Frequency Scaling (VFS) range, from the supply voltage at which the minimum energy per operation (MEP) is achieved, up to the nominal voltage for the process. First the sizing methodology of two library cells using transistors with different threshold voltages: Regular-VT (RVT) and Low-VT (LVT) is described. Just five combinational cells: INV, NAND, NOR, OAI21, and AOI22 comprise the libraries plus two register cells, all with multiple strengths, for RVT ones. The sizing rule for the transistors of each cell is directly driven by requiring equal rise and fall times in order to attenuate variability effects at very low supply voltages. These cell libraries were characterized for typical, fast, and slow process corners, over temperature (-40°C, 25°C, and 125°C) variations, and for supply voltages varying from 200 mV up to 1.2 V with small supply steps. Circuit syntheses were performed for ten VLSI circuit benchmarks: notch filter, 8051 compatible core, and eight ISCAS benchmark circuits, considering all VDD operating points. We show that at the optimum MEP point (near-VT) an average reduction of 54.46% and 99.01% in energy is possible, when compared with deep sub-threshold and nominal supply voltages, respectively, at room temperature. The extremely wide VFS regime enables operating frequencies varying from hundreds of kHz up to MHz/GHz at -40°C and 25°C, and from MHz up to GHz at 125°C. The near-VT designs herein presented, when compared to related work, showed on average an energy reduction and performance gain of 24.1% and 152.68%, respectively, for the same circuit benchmarks. Comparison of near-VT operation at very low and high temperatures show advantages for a hotter CMOS operation for this regime.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122878379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PHiCIT — Improving hierarchical Networks-on-Chip through 3D silicon photonics integration","authors":"C. Reinbrecht, Martha Johanna Sepúlveda, A. Susin","doi":"10.1145/2800986.2801022","DOIUrl":"https://doi.org/10.1145/2800986.2801022","url":null,"abstract":"The Network-on-Chip (NoC) architecture has been seen as an interconnect solution for complex systems. However, performance and energy issues still represent limiting factors for Multi-Processors Systems-on-Chip (MPSoC). In order to match low power and high performance, hierarchical NoCs have been proposed, with interconnecting clusters of IPs tailored to application specific domains. In the near future however, this methodology will be limited by the long wires for global connection. In this paper, we present an optimized hierarchical network-on-chip, the PHiCIT (Photonic Hierarchical Crossbar-based Interconnection Three-dimensional architecture). This architecture proposal aims to maximize the overall performance by using three levels of interconnection: photonic crossbars for intra-cluster communication, traditional electric routers for the inter-cluster communication, and 3D technology to explore power and area optimization. Experimental results show that PHiCIT can reduce the latency against a pure electrical mesh NoC by up to 47%, against an electric hierarchical NoC by up to 6%, and against a photonic mesh NoC by up to 34%, considering PARSEC benchmark applications.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127771547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vladimir Afonso, Henrique Maich, Luan Audibert, B. Zatt, M. Porto, L. Agostini
{"title":"Memory-aware and high-throughput hardware design for the HEVC fractional motion estimation","authors":"Vladimir Afonso, Henrique Maich, Luan Audibert, B. Zatt, M. Porto, L. Agostini","doi":"10.1145/2800986.2801017","DOIUrl":"https://doi.org/10.1145/2800986.2801017","url":null,"abstract":"This paper presents a hardware design for the Fractional Motion Estimation (FME) of the High Efficiency Video Coding (HEVC) standard. The solution designed in this work uses a scheme to reduce the number of accesses to the reference frames stored in the external memory in up to 49.22%. A strategy to reduce the computational effort is also used. This strategy consists in using only the four square-shaped Prediction Unit (PU) sizes rather than using all the 24 possible PU sizes. This approach reduces the total encoding time in about 59%, with a bit-rate increase of only 4% for the same image quality. The hardware design was described in VHDL and synthesized for FPGA and ASIC technologies. The synthesis results for TSMC 65nm standard cells demonstrate that the developed design is able to process UHD 2160p videos at 60 frames per second (fps), reducing the required hardware resources in about five times when compared with the main related work.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127001615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Cantalice, A. Simionovski, F. P. Cortes, M. Lubaszewski
{"title":"Low power, high-sensitivity clock recovery circuit for LF/HF RFID applications","authors":"Rafael Cantalice, A. Simionovski, F. P. Cortes, M. Lubaszewski","doi":"10.1145/2800986.2801015","DOIUrl":"https://doi.org/10.1145/2800986.2801015","url":null,"abstract":"This paper presents a fully integrated CMOS carrier clock recovery circuit for RFID applications. The architecture is based on a PMOS-input folded-cascode amplifier that combined with a modulator stage, present at conventional RFID transponders, achieves a good clock recovery performance even with a few mV at the antenna during modulation, allowing the transponder to communicate in a higher distance. Fabricated in a deep-submicron CMOS process, the circuit works with a 3 V power supply and delivers a 1 V peak-to-peak digital clock signal. Experimental data show that this circuit provides clock recovery with a 100 mV sensitivity (peak to peak) consuming only 260 nA of current.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132086342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MCML gate design for standard cell library","authors":"Bruno Canal, Cicero Nunes, R. Ribas, E. Fabris","doi":"10.1145/2800986.2801016","DOIUrl":"https://doi.org/10.1145/2800986.2801016","url":null,"abstract":"This paper evaluates the impact of MCML gate design specifications into a standard cell library. The tradeoff between design parameters (bias current, voltage swing and noise margin) and maximum fate operating frequency are taken into account. We demonstrate that in MCML standard cell library the voltage swing and noise margin should be uniform for all logic gates. All evaluations were done over a 0.6μm CMOS technology. The transistor sizing necessary to achieve the required noise margin makes small voltage swings not attractive to MCML gate design. The increase of bias current also requires larger transistors as a result the propagation delay gain is no more significant for higher bias current. In the studied case, this value is around 100μA. The analysis of the library composition demonstrates that functions of four inputs have better performance if they are implemented using two inputs cascaded gates instead of a dedicated four inputs gate.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128388886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Cordova, P. Toledo, H. Klimach, S. Bampi, E. Fabris
{"title":"0.5 V supply voltage reference based on the MOSFET ZTC condition","authors":"D. Cordova, P. Toledo, H. Klimach, S. Bampi, E. Fabris","doi":"10.1145/2800986.2800988","DOIUrl":"https://doi.org/10.1145/2800986.2800988","url":null,"abstract":"The continuous scaling of CMOS devices has required the consequent reduction of the supply voltages. There is a need for analog and RF circuits able to operate under at supplies lower than 0.5 V. This paper presents a voltage reference based on the MOSFET zero-temperature condition (ZTC) that operates with a low 0.5 V supply. The circuit is composed by a diode-connected MOS transistor operating near the ZTC condition that is biased by a proportional-to-absolute-temperature (PTAT) current reference implemented with Schottky-diodes. The ZTC condition is analysed using a continuous MOSFET model that is valid from weak to strong inversion and the circuit behaviour is described by theoretical expressions. Our reference circuit is designed for 3 versions: each with MOSFETs of different threshold voltage (standard-VT, low-VT, and zero-VT), all available in the 130 nm CMOS process used. These designs result in three different and very low reference voltages: 312, 237, and 51 mV. All 3 designed reference operate in the range of 0.45 to 1.2 V of supply voltages, consuming 1 uA of typical supply current. Post-layout simulations present a Temperature Coefficients (TCs) of 214, 372, and 953 ppm/°C in a temperature range from -55 to 125°C, respectively. Monte-Carlo simulations show the fabrication variability impact on the circuit performance. The voltage reference was designed in a 130 nm process and it uses 0.014 mm2 of silicon area.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134604223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Curtinhas, T. Cavalcante, D. L. Oliveira, L. Faria, O. Saotome
{"title":"Minimization and encoding of high performance asynchronous state machines based on genetic algorithm","authors":"T. Curtinhas, T. Cavalcante, D. L. Oliveira, L. Faria, O. Saotome","doi":"10.1145/2800986.2801018","DOIUrl":"https://doi.org/10.1145/2800986.2801018","url":null,"abstract":"Today, the design of complex synchronous digital systems shows serious difficulties relating to the global clock and to Deep-Sub-Micron MOS technology. The asynchronous design is an interesting alternative to solve these difficulties, once they do not present clock skew or distribution problems. However, the lack of tools for automatic synthesis is still a major drawback. Asynchronous Finite State Machines (AFSM) are widely used in the control of asynchronous digital systems. A very popular machine is the burst-mode Huffman machine (BM_HM), which accepts burst-mode specification is implemented as Huffman machines (HM). HM architecture, when compared to HM architecture with fed-back output, has advantages such as a better interaction with fast environments, reducing the cost of timing analysis, and a lower latency time. As disadvantage, the area tends to be bigger. This paper proposes two novel algorithms based on genetic algorithms for the minimization and assignment of states, which are important steps in the synthesis of BM_HMs. These two algorithms were implemented in SAGAAs tool, which was tested in an extensive set of benchmarks, showing a high efficiency when compared to Minimalist tool that is state-of-the-art. It achieved an average reduction of 5.91% in the number of products, 15.50% in the number of literals and 32.61% in the total processing time. Our approach presents a low penalty of 1.56% and 4.41% in the number of states and in number of inserted state variables.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"177 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. A. Silva, Lucas Albers Cuminato, Vanderlei Bonato, P. Diniz
{"title":"Run-time cache configuration for the LEON-3 embedded processor","authors":"B. A. Silva, Lucas Albers Cuminato, Vanderlei Bonato, P. Diniz","doi":"10.1145/2800986.2801026","DOIUrl":"https://doi.org/10.1145/2800986.2801026","url":null,"abstract":"Cache parameters such as size and associativity are fixed at manufacturing time which are often not tuned for the specific characteristics of each application code. The net result is excessive energy consumption and lower performance. This paper explores the benefits of the use of a reconfigurable data cache in terms of capacity and associativity in a LEON-3 embedded system. We present real energy and execution time results for a set of graph-based and numerical algorithms. For a combined application of these algorithms, the results reveal an aggregate energy savings of 7% and a execution time penalty of just 1% over the best fixed-associativity cache architecture with the same capacity. We further explore the performance of a dynamic cache way shutdown adaptive algorithm and evaluate its performance and energy benefits in the context of the SLAM-EKF position estimation robotics algorithm.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130920627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of high-voltage level shifters based on stacked standard transistors for a wide range of supply voltages","authors":"Sara Pashmineh, D. Killat","doi":"10.1145/2800986.2801003","DOIUrl":"https://doi.org/10.1145/2800986.2801003","url":null,"abstract":"This paper presents the design of two high-voltage level shifters suitable for a wide range of supply voltages. In view of certain drawbacks identified during the design, implementation, simulation and measurement of a 3-stacked CMOS driver using capacitive feedback level shifters, improved high-voltage level shifters are designed. These circuits are compared with each other in terms of their circuit description, drawbacks, advantages and simulation results. The circuit designs are technology-independent and compatible with scaled CMOS devices because these circuits are based on stacked standard transistors. Both high-voltage level shifters are proved by simulating in 65-nm TSMC technology with a nominal voltage of 2.5 V. The level shifter can be applied for supply voltages between 2.6 V / 3.5 V and 7.5 V, respectively. The supply voltage range is extended by 67% and 104% respectively when compared against common level shifters.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. W. A. Soares, D. Belfort, S. Catunda, R. Freire
{"title":"Analysis and system-level design of a high resolution incremental ΣΔ ADC for biomedical applications","authors":"A. W. A. Soares, D. Belfort, S. Catunda, R. Freire","doi":"10.1145/2800986.2800998","DOIUrl":"https://doi.org/10.1145/2800986.2800998","url":null,"abstract":"This paper presents an analysis and system-level design of an incremental sigma-delta converter (IΣΔ ADC) in order to explore a possible solution to low power multi-channel applications. The problem of using classic ΣΔ ADCs for applications which require time multiplexed signals will be discussed. The IΣΔ ADCs are characterized for resetting all memory elements present in ΣΔ modulator core and digital filter in the beginning of each conversion. The modulator architecture consists of a 4th loop filter using feedforward summation topology and its coefficients were provided through a simple algorithm which establishes the minimum required number of clock cycles for one conversion. SIMULINK building blocks were used to model the non idealities, such as sampling jitter, switches' and op-amps' thermal noise, finite bandwidth, slew rate and finite DC gain. The results show that the modulator achieves a signal-to-noise ratio (SNR) greater than 100 dB for 80 kHz signal bandwidth divided for 20 channels.","PeriodicalId":325572,"journal":{"name":"2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129564054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}