{"title":"Asynchronous sub-threshold ultra-low power processor","authors":"R. Diamant, R. Ginosar, C. Sotiriou","doi":"10.1109/PATMOS.2015.7347592","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347592","url":null,"abstract":"Ultra low power VLSI circuits may enable applications such as medical implants, sensor networks and “things” for IoT. Aggressive supply voltage scaling is known to significantly improve power consumption and efficiency, but incurs both performance degradation and high delay variations. We illustrate that the most energy efficient operating point of a pipelined MIPS CPU lies in the deep sub-threshold region. We investigate the optimal selection of technology node, process variant and transistor type, and compare synchronous and asynchronous designs. We identify the optimal performance/power ratio design point for the 28nm high-k metal-gate high-performance process with high VT transistors and a bundled-data asynchronous design style to efficiently accommodate delay variations. We illustrate a 7.4× power efficiency improvement potential for the CPU, coupled with a reduction in power consumption by more than one thousand, relative to a synchronous CPU operating at nominal voltage. The asynchronous sub-threshold MIPS CPU designed in this work is compared with other commercial and research CPUs, and is shown to achieve superior power efficiency.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Valentin, Mário Salvatierra, Rosiane de Freitas, R. Barreto
{"title":"Response time schedulability analysis for hard real-time systems accounting DVFS latency on heterogeneous cluster-based platform","authors":"E. Valentin, Mário Salvatierra, Rosiane de Freitas, R. Barreto","doi":"10.1109/PATMOS.2015.7347580","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347580","url":null,"abstract":"The power wall is a barrier to improving the processor design process due to the power consumption of components. The usage of heterogeneous multicore platforms is appealing for applications, e.g. hard real-time systems, owing to the potential reduced energy consumption offered by such platforms. However, hard real-time systems are present in life critical environments and reducing the energy consumption on such systems is an onerous and complex process. This paper assesses the problem of providing response time schedulability conditions for hard real-time systems on cluster-based platforms. We extend the existing theory with a novel schedulability test that accounts for the natural latency inherited from the usage of DVFS. We also compare our approach with state of the art methods by means of empirical experiments. Our proposed response time schedulability test avoids up to 99% false positive and false negative errors observed in the well known schedulability analyses' literature.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125426486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Better-than-voltage scaling energy reduction in approximate SRAMs via bit dropping and bit reuse","authors":"F. Frustaci, D. Blaauw, D. Sylvester, M. Alioto","doi":"10.1109/PATMOS.2015.7347598","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347598","url":null,"abstract":"This paper explores the effectiveness of different knobs to dynamically trade energy consumption with output quality in approximate SRAMs for error-tolerant applications (such as video). Leveraging the different impact of errors on quality at most significant bit (MSB) and least significant bit (LSB) positions, energy savings higher than those provided by simple voltage scaling are enabled. Firstly, a comparison of two techniques, dual-VDD and LSB dropping, is carried out showing that the latter is preferable thanks to its intrinsic simplicity and more pronounced energy savings. Secondly, a selective Error Correction Code (ECC) technique which reuses the LSBs as check bits to protect MSBs is investigated. Measurements on a 28nm CMOS 32kb SRAM show that bit dropping and bit reuse achieve an energy reduction of up to 33% and 28%, compared to simple voltage scaling at iso-quality. When combined together, the two techniques achieve a better energy saving (40%) and a supply voltage reduction of about 100mV at iso-quality. Finally, guidelines to select the energy-optimal combination of the two techniques are provided for a given quality target.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122304814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Molnos, W. Lombardi, D. Puschini, Julien Mottin, S. Lesecq, A. Tonda
{"title":"Energy management via PI control for data parallel applications with throughput constraints","authors":"A. Molnos, W. Lombardi, D. Puschini, Julien Mottin, S. Lesecq, A. Tonda","doi":"10.1109/PATMOS.2015.7347588","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347588","url":null,"abstract":"This paper presents a new proportional-integral (PI) controller that sets the operating point of computing tiles in a system on chip (SoC). We address data-parallel applications with throughput constraints. The controller settings are investigated for application configurations with different QoS levels and different buffer sizes. The control method is evaluated on a test chip with four tiles executing a realistic HMAX object recognition application. Experimental results suggest that the proposed controller outperforms the state-of-the-art results: it attains, on average, 25% less number of frequency switches and has slightly higher energy savings. The reduction in number of frequency switches is important because it decreases the involved overhead. In addition, the PI controller meets the throughput constraint in cases where other approaches fail.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nelson Alves Ferreira Neto, J. Oliveira, Wagner Oliveira, Joao Carlos Bittencourt
{"title":"VLSI architecture design and implementation of a LDPC encoder for the IEEE 802.22 WRAN standard","authors":"Nelson Alves Ferreira Neto, J. Oliveira, Wagner Oliveira, Joao Carlos Bittencourt","doi":"10.1109/PATMOS.2015.7347589","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347589","url":null,"abstract":"This paper presents two architectures for the Low Density Parity Check (LDPC) encoder, the first one based on a fully serial approach and the second one in a mixed way, as well as their respective realizations in ASIC. The proposed designs are capable of operating in 84 combinations of code rate and word size, according to the IEEE 802.22 Wireless Regional Area Network (WRAN) standard, aiming low power and small area. Although the proposed architectures are primarily designed for the mentioned standard, they can be easily adapted to other wireless broadband standards.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132448292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ABeeMap: A mapping algorithm based on multi-objective Artificial Bee Colony","authors":"V. L. Souza, A. Silva-Filho, V. C. Wanderely","doi":"10.1109/PATMOS.2015.7347582","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347582","url":null,"abstract":"This paper presents the ABeeMap, a new approach to FPGA technology mapping. The mapper is based on a hybrid approach that uses pareto-dominance based asynchronous multi-objective Artificial Bee Colony associated with specific heuristics of the problem in order to find better trade-off results among area, performance and power consumption. In a set of 20 designs, we find that in comparison to state-of-the-art technology mapping, our approach is able to reduce the LUT counts and the edge counts. Placing and routing the resulting netlist leads to reduction in the configurable logic blocks count, increasing in estimated operation frequency and reduction in energy consumption.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133999411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy efficiency of Zipf traffic distributions within Facebook's data center fabric architecture","authors":"L. Durbeck, J. Tront, N. Macias","doi":"10.1109/PATMOS.2015.7347601","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347601","url":null,"abstract":"Open architectures like the one recently unveiled by Facebook allow a detailed assessment of the energy efficiency of commercial data centers. This paper explores the fit of Zipf-like distributions typical of network traffic, to updates of user pages and the entity graph, for the new Facebook data center network architecture. We find that network resource consumption could be reduced by as much as 40-50% through several changes, either to the software, or to the data center design. Of these, employing a connected hub-and-spoke subgraph representation for each popular node, with each pod operating locally on its node of the subgraph, appears to hold the most energy savings potential. This work is part of a larger effort to more completely characterize the efficiency of data center computer-and network architectures beyond the normal reporting of facility power utilization efficiency (PUE), which is blind to energy proportionality and other aspects of the efficiency within the computer- and network architecture, or IT portion, of the data center.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ismael Seidel, André Beims Bräscher, José Luís Almada Güntzel
{"title":"Combining Pel Decimation with Partial Distortion Elimination to increase SAD energy efficiency","authors":"Ismael Seidel, André Beims Bräscher, José Luís Almada Güntzel","doi":"10.1109/PATMOS.2015.7347604","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347604","url":null,"abstract":"The most energy-hungry step of Video Coding (VC) is the Block Matching Algorithm (BMA), even when a simple similarity metric such as the Sum of Absolute Differences (SAD) is employed. Moreover, with the increasing resolutions supported by state-of-the-art VC standards (H.264/AVC, HEVC and VP9), the SAD must be as energy-efficient as possible to increase the battery lifetime in portable mobile devices. Two well-known techniques to decrease the number of operations in SAD calculation are Pel Decimation and Partial Distortion Elimination (PDE). The energy savings provided by the former are dictated by the chosen decimation ratio and comes with a cost in coding efficiency. For the latter, energy savings have no cost in coding efficiency but are dictated by the video content and search parameters. In this work we present two configurable SAD4×4 architectures: one designed to dynamically operate using one among four Pel Decimation ratios (1:1, 4:3, 2:1 or 4:1) and the other one able to use PDE in addition to Pel Decimation. We simulated Pel Decimation and PDE behavior during motion estimation using 22 video samples from the Common Test Conditions (CTC) encoded using 4 different quantization parameters (QPs). Thus, this simulation was performed over 5.82×1012 PDE SADs. The Pel Decimation impacts are shown in terms of Bjøntegaard Delta (BD)-Rate, ranging from 3.16% (1:1 ratio) up to 21.94% (4:1). In addition, we found that by using PDE solely (i.e., without Pel Decimation) one can reduce from 10 to 6.38 (in average) the number of required cycles to calculate one SAD. To show the improvements in terms of energy, we synthesized both presented architectures using a 45nm standard cell library. Finally, the use of PDE can improve energy efficiency more than Pel Decimation alone, without coding efficiency degradation.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121283865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wideband dynamic voltage sensing mechanism for EH systems","authors":"K. Gao, Y. Xu, D. Shang, Fei Xia, A. Yakovlev","doi":"10.1109/PATMOS.2015.7347605","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347605","url":null,"abstract":"In Energy Harvesting (EH) scenarios, the `survival zone' pertains to the state of power supply with insufficient energy to provide a nominal and stable Vdd. In this situation the system Vdd tends to be low and to vary over a wide band. Benefits can be had if the system can already function to some degree under survival zone conditions. Such functionalities may include providing control to improve the efficiency of power processing units and starting the computation load for light but crucial survival-related tasks. Knowledge of the Vdd is often indispensable for running these types of survival zone functionalities. A novel low-power voltage sensing scheme for EH based electronic systems is proposed to function in the survival zone to provide this vital Vdd information. The method is derived by combining voltage controlled delays and simple circuits to implement time comparison. This paper describes the design, implementation and analysis of this sensing subsystem, which itself draws power from the variable and low Vdd which it is sensing.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125299827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roger Caputo-Llanos, Diego V. S. Sousa, M. Terres, G. Bontorin, R. Reis, M. Johann
{"title":"Energy-efficient Level Shifter topology","authors":"Roger Caputo-Llanos, Diego V. S. Sousa, M. Terres, G. Bontorin, R. Reis, M. Johann","doi":"10.1109/PATMOS.2015.7347600","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347600","url":null,"abstract":"Level Shifters (LS) are essential components of integrated circuits with multiple power supply. They work as voltage scaling interfaces between different power domains. In this paper, we present an energy-efficient level shifter with low area topology. It requires only one power rail and can operate nearby the threshold voltage. We validated the proposed topology with simulations on an IBM 130nm CMOS technology. We compared our topology with traditional LS, like the Differential Cascode Voltage Switch (DCVS) or the Puri's topology. The proposed topology requires up to 93.79% less energy under certain conditions. It presented 88.03% smaller delay and 39.6% less Power-Delay Product (PDP) when compared to the DCVS topology. In contrast with the Puri's level shifter, we obtained a reduction of 32.08% in power consumption, 13.26% smaller delay and 15.37% lower PDP. In addition, our level shifter was the only one capable to work at 35% of the nominal supply.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}