Victor H. S. Lima, Matheus F. Stigger, L. Soares, C. Diniz, S. Bampi
{"title":"Configurable Approximate Hardware Accelerator to Compute SATD and SAD Metrics for Low Power All-Intra High Efficiency Video Coding","authors":"Victor H. S. Lima, Matheus F. Stigger, L. Soares, C. Diniz, S. Bampi","doi":"10.1109/SBCCI53441.2021.9529974","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529974","url":null,"abstract":"Connecting billions of network cameras to the cloud is a challenge that heavily taxes the network bandwidth for video transmissions. High Efficiency Video Coding (HEVC) standard offers a good option from the bit-rate reduction and video quality perspectives, but it is more computational complex than previous standards. This paper uses HEVC All-Intra configuration in this context, thus simplifying video encoding by avoiding interframe prediction, and by using VLSI hardware acceleration and approximate computing. Sum of Absolute Transformed Differences (SATD) is a distortion metric used in intra-mode decision fast algorithm and consumes a significant part of intra-frame encoding execution time in software. This work proposes a configurable-approximate hardware accelerator supporting 8 × 8 SATD, the simpler Sum of Absolute Differences (SAD) metric, and two approximate SATD versions by excluding columns of arithmetic operators of the 8 × 8 Hadamard Transform. When operating in three-columns exclusion, five-columns exclusion, and SAD configurations, the total VLSI power dissipation is reduced by 19.87%, 32.33% and 39.16% respectively, when compared to precise SATD computation.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129130926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinicius Zanandrea, Douglas M. Borges, V. S. Rosa, C. Meinhardt
{"title":"Exploring Approximate Computing and Near- Threshold Operation to Design Energy -efficient Multipliers","authors":"Vinicius Zanandrea, Douglas M. Borges, V. S. Rosa, C. Meinhardt","doi":"10.1109/SBCCI53441.2021.9529347","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529347","url":null,"abstract":"Multiplier circuits are components of particular relevance for digital systems. Hardware projects often require fast and low-power multipliers. In this regard, this work evaluates a set of multiplier circuits to explore alternative approaches for energy-efficient scenarios. This work explores two techniques for energy efficiency: reducing the operating voltage (near-threshold operation) and through Approximate Computing. Two approximate adders are adopted in the lower bits. Altogether, eight operation scenarios are considered and evaluated at the electrical level, providing an overall discussion of the most indicate approaches for different design requirements. The results show that by applying near-threshold operation, it is possible to achieve a considerable reduction in power consumption, however, with a significant increase in delay times. The replacement of exact Mirror adders by approximate AMA2 provided a reduction of up to 29.6% in energy consumption and up to 4% in delay for the evaluated multiplier circuits.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115296488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vassilis Alimisis, Marios Gourdouparis, Christos Dimas, P. Sotiriadis
{"title":"A 0.6 V, 3.3 nW, Adjustable Gaussian Circuit for Tunable Kernel Functions","authors":"Vassilis Alimisis, Marios Gourdouparis, Christos Dimas, P. Sotiriadis","doi":"10.1109/SBCCI53441.2021.9529988","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529988","url":null,"abstract":"This work introduces a compact, ultra-low power (3.3nW) Gaussian circuit architecture for Kernel function emulation. It has independent and electronically adjustable mean value, amplitude and deviation, operating with 0.6V power supply. It consists of a current correlator and a bulk-controlled differential block, with all transistors operating in sub-threshold. Proper operation, accuracy and sensitivity are confirmed via post-layout simulation results and theoretical analysis. It was implemented in TSMC 90nm CMOS process and simulated using the Cadence IC Suite.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125931864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adenilson F. De Castro, Ronny S. R. Milléo, L. Lolis, A. Mariano
{"title":"Artificial Neural Network Based Automatic Modulation Classification System Applied to FPGA","authors":"Adenilson F. De Castro, Ronny S. R. Milléo, L. Lolis, A. Mariano","doi":"10.1109/SBCCI53441.2021.9529976","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529976","url":null,"abstract":"The wireless communication systems face rapid growth, driven by advances in new technologies such as the 5G and the Internet of Things. However, this growth faces a limitation: the scarcity of frequencies in the electromagnetic spectrum, demanding efficient technologies to improve its utilization. For this reason, this work aimed to construct an Automatic Modulation Classification system and implement it in both software and hardware, using an FPGA. The resulting models can classify five modulations and a noise-only signal, using an Artificial Neural Network architecture, which was constructed based on the test of over 2000 different topologies, resulting in distinct configurations for each technology due to their intrinsic limitations. Both setups achieved approximately 90% of accuracy when the SNR is ≥4 dB and are capable of outperforming similar works developed so far, as it uses a set of inputs that require less computational time and resource utilization on its execution.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124300390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Jordan, Guilherme Korol, M. B. Rutzig, A. C. S. Beck
{"title":"MUTECO: A Framework for Collaborative Allocation in CPU-FPGA Multi-tenant Environments","authors":"M. Jordan, Guilherme Korol, M. B. Rutzig, A. C. S. Beck","doi":"10.1109/SBCCI53441.2021.9529992","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529992","url":null,"abstract":"CPU-FPGA collaborative environments are progressively being adopted by Cloud Warehouses. In this environment, multiple clients share the same infrastructure to maximize resource utilization with energy efficiency and scalability. However, such a provisioning of resources is challenging, since kernels may be concurrently assigned to both CPU and FPGA in a scenario where available resources and workload characteristics drastically vary. To make the best use of resources in this complex environment, we propose MUTECO: A MUlti-TEnant COllaborative resource provisioning framework. MUTECO optimizes considering both multitenancy and CPU-FPGA collaborative execution, in contrast to existing approaches that focus on collaborative single-tenant or non-collaborative multi-tenant workloads. MUTECO is highly configurable and integrated to the Hypervisor layer, so it can be tuned to optimize convergence time, performance, and energy, according to different scenarios that comprise number of tenant requests, the incoming kernels' behavior, and the available resources. Over a varied set of scenarios, MUTECO outperforms in up to 2.91x and 2.39x the current non-collaborative and single-tenant approaches.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft Error Tolerant Quasi-Delay Insensitive Asynchronous Circuits: Advancements and Challenges","authors":"Ashiq A. Sakib","doi":"10.1109/SBCCI53441.2021.9530001","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9530001","url":null,"abstract":"Susceptibility to soft errors caused by radiation or other noise sources is a major concern for devices operating with limited supply voltages at scaled technology nodes. These errors are occasional abnormalities that give rise to single event effects (SEE), which may corrupt the circuit functionality. Although the insensitivity to timing variations allows quasi delay insensitive (QDI) circuits to be robust against many of the radiation or noise effects that affect the timing behavior of CMOS based synchronous digital circuits, they are still susceptible to soft errors. Various error detection, mitigation, and radiation hardening schemes for QDI circuits exist in the literature. This paper provides a comprehensive overview of the existing techniques and a comparative analysis of their significance, performance, and limitations.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129127460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiago Knorst, M. Jordan, Arthur F. Lorenzen, M. B. Rutzig, Antonio Carlos Schneider Beck
{"title":"ETCG: Energy-Aware CPU Thread Throttling for CPU-GPU Collaborative Environments","authors":"Tiago Knorst, M. Jordan, Arthur F. Lorenzen, M. B. Rutzig, Antonio Carlos Schneider Beck","doi":"10.1109/SBCCI53441.2021.9529986","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529986","url":null,"abstract":"High-Performance computing systems have been constantly adopting CPU-GPU architectures as a collaborative environment to accelerate applications by partitioning threads/kernels execution across both devices. However, exploiting the synergetic benefits of this system is challenging, since maximizing resource utilization by triggering the highest number threads is not always the best strategy to optimize performance or energy consumption. This work shows that selecting the right number of CPU threads in a CPU-GPU collaborative environment is even trickier. To address this problem, we propose ETCG - Energy-aware CPU Thread throttling for CPU-GPU collaborative environments. ETCG transparently selects a near-optimal number of CPU threads to minimize the energy-delay product (EDP) of CPU-GPU applications. Compared to the use of the maximum number of threads supported by the hardware, ETCG provides, on average, 73% of EDP reduction. In addition, ETCG shows, on average, 3% less EDP by just taking 5% of searching time compared to the optimal solution.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Berndt, I. S. Campos, B. Lima, M. Grellert, J. T. Carvalho, C. Meinhardt, B. A. de Abreu
{"title":"Accuracy and Size Trade-off of a Cartesian Genetic Programming Flow for Logic Optimization","authors":"A. Berndt, I. S. Campos, B. Lima, M. Grellert, J. T. Carvalho, C. Meinhardt, B. A. de Abreu","doi":"10.1109/SBCCI53441.2021.9529968","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529968","url":null,"abstract":"Logic synthesis tools face tough challenges when providing algorithms for synthesizing circuits with increased inputs and complexity. Traditional approaches for logic synthesis have been in the spotlight so far. However, due to advances in machine learning and their high performance in solving specific problems, such algorithms appear as an attractive option to improve electronic design tools. In our work, we explore Cartesian Genetic Programming for logic optimization of exact or approximate combinational circuits. The proposed CGP flow receives input from the circuit description in the format of AND-Inverter Graphs and its expected behavior as a truth-table. The CGP may improve solutions found by other techniques used for bootstrapping the evolutionary process or initialize the search from random (unbiased) individuals seeking optimal circuits. We propose two different evaluation methods for the CGP: to minimize the number of AIG nodes or optimize the circuit accuracy. We obtain at least 22.6% superior results when considering the ratio between accuracy and size for the benchmarks used, compared with the teams from the IWLS 2020 contest that obtained the best accuracy and size results. It is noteworthy that any logic synthesis approach based on AIGs can easily incorporate the proposed flow. The results obtained show that their usage may achieve improved logic circuits.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116924301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"0.5 V 19 nW Smart Temperature Sensor for Ultra-Low-Power CMOS Applications","authors":"Daniel C. Lott, Dalton Martini Colombo","doi":"10.1109/SBCCI53441.2021.9529980","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529980","url":null,"abstract":"The smart temperature sensor measures the room temperature and converts it to the digital domain, thus making it easier to process and store data. This work presents a fully integrated smart temperature sensor implemented in a 180 nm CMOS technology suitable for low voltage and ultra-low power electronic applications. The designed circuit uses a frequency to digital conversion topology, in which the frequency of an internal signal is linearly dependent on the room temperature. The minimum supply voltage for the designed circuit is only 0.5 V, while the occupied silicon area is 0.04 mm2, By utilizing a proper circuit topology and the power gating technique, very low power consumption of 19 nW for a sampling frequency of 100 Hz at 27 °C is achieved. Moreover, the sensor consumes nominally 190 pJ per conversion. The simulated inaccuracy using nominal (TT) transistor models is lower than 0.5 °C over a wide temperature range of -30°C to 100 °C.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"50 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131722076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Throughput Sharp Interpolation Filter Hardware Architecture for the AV1 Video Codec","authors":"Daiane Freitas, C. Diniz, M. Grellert, G. Corrêa","doi":"10.1109/SBCCI53441.2021.9529993","DOIUrl":"https://doi.org/10.1109/SBCCI53441.2021.9529993","url":null,"abstract":"Motion Estimation (ME) is one of the most important steps of modern video encoders, due to its task of reducing temporal redundancies, but it is also highly computing intensive. The fractional part of ME is particularly complex, since it involves interpolating fractional pixels before the search. The fractional ME of AV1 encoders is even more challenging, since it supports up to 90 different interpolation filters grouped in 4 families. In this work, a dedicated interpolation hardware is proposed to mitigate this issue. The designed architecture interpolates the sharp interpolation filter family of AV1. A complexity analysis evaluates the throughput required in this process, showing that the designed architecture can process a 3840×2160 video sequences in real-time considering the motion compensation step, requiring 63.14 mW of power to operate.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133589779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}