On the impacts of pel decimation and High-Vt/Low-Vdd on SAD calculation

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI) Pub Date : 2013-10-24 DOI:10.1109/SBCCI.2013.6644880

Ismael Seidel, Bruno George de Moraes, André Beims Bräscher, José Luís Almada Güntzel

{"title":"On the impacts of pel decimation and High-Vt/Low-Vdd on SAD calculation","authors":"Ismael Seidel, Bruno George de Moraes, André Beims Bräscher, José Luís Almada Güntzel","doi":"10.1109/SBCCI.2013.6644880","DOIUrl":null,"url":null,"abstract":"As the number of pixels per frame tends to increase in new high definition video coding standards such as HEVC, pel decimation appears as a viable means of increasing the energy efficiency of Sum of Absolute Differences (SAD) calculation. This paper presents a VLSI architecture that can be configured to compute the SAD of 4×4 pixel blocks with no subsampling or with 2:1 or 4:1 subsampling (pel decimation). The proposed architecture was synthesized for 130nm, 90nm, 65nm and 45nm standard cell libraries assuming both nominal and Low-Vdd/High-Vt (LH) cases for maximum and a given target throughput. The impacts of subsampling and Low-Vdd/High-Vt on delay, power and energy efficiency are analyzed. In a total of 16 syntheses, the 45nm/LH configurable SAD architecture achieved the highest energy efficiency for target frequency when operating in pel decimation 4:1, spending only 2.19pJ for each 4×4 block, which corresponds to about 20.64 times less energy than the 130nm/nominal configurable architecture operating in full SAD mode. Aside the improvements achieved by using LH, pel decimation solely was responsible for energy reductions of 40% and 60% when 2:1 and 4:1 subsamplings are chosen, respectively, in the configurable architecture.","PeriodicalId":203604,"journal":{"name":"2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"2006 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBCCI.2013.6644880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

As the number of pixels per frame tends to increase in new high definition video coding standards such as HEVC, pel decimation appears as a viable means of increasing the energy efficiency of Sum of Absolute Differences (SAD) calculation. This paper presents a VLSI architecture that can be configured to compute the SAD of 4×4 pixel blocks with no subsampling or with 2:1 or 4:1 subsampling (pel decimation). The proposed architecture was synthesized for 130nm, 90nm, 65nm and 45nm standard cell libraries assuming both nominal and Low-Vdd/High-Vt (LH) cases for maximum and a given target throughput. The impacts of subsampling and Low-Vdd/High-Vt on delay, power and energy efficiency are analyzed. In a total of 16 syntheses, the 45nm/LH configurable SAD architecture achieved the highest energy efficiency for target frequency when operating in pel decimation 4:1, spending only 2.19pJ for each 4×4 block, which corresponds to about 20.64 times less energy than the 130nm/nominal configurable architecture operating in full SAD mode. Aside the improvements achieved by using LH, pel decimation solely was responsible for energy reductions of 40% and 60% when 2:1 and 4:1 subsamplings are chosen, respectively, in the configurable architecture.

查看原文本刊更多论文

低vdd /高vdd对SAD计算的影响

随着新的高清视频编码标准(如HEVC)中每帧像素数的增加，像素抽取成为提高绝对差和(SAD)计算能效的可行手段。本文提出了一种VLSI架构，可以配置为计算4×4像素块的SAD，而不进行子采样或进行2:1或4:1的子采样(pel decimation)。所提出的架构是针对130nm, 90nm, 65nm和45nm标准单元库合成的，假设最大吞吐量和给定目标吞吐量的标称和低vdd /高vt (LH)情况。分析了次采样和低vdd /高vt对延迟、功率和能效的影响。在总共16次合成中，45nm/LH可配置的SAD架构在pel decimation 4:1工作时实现了最高的目标频率能量效率，每个4×4块仅花费2.19pJ，相当于在完全SAD模式下工作的130nm/nominal可配置架构的能量减少了约20.64倍。除了使用LH实现的改进之外，当在可配置架构中分别选择2:1和4:1子采样时，pel抽取单独负责减少40%和60%的能量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 26th Symposium on Integrated Circuits and Systems Design (SBCCI)

自引率

0.00%

发文量