{"title":"一种用于H.264/AVC自适应块化滤波器的高吞吐量、高效率硬件加速器","authors":"M. Nadeem, Stephan Wong, G. Kuzmanov, A. Shabbir","doi":"10.1109/ESTMED.2009.5336814","DOIUrl":null,"url":null,"abstract":"In this paper, we present a high-throughput, area-efficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units, the transpose units are realized by efficient reuse of hardware resources to further reduce the area requirement. The two filtering units process the horizontal and vertical edges of the macro-block simultaneously and therefore further enhance the throughput of the hardware accelerator. Several other optimization techniques, such as reuse of intermediate results, pipelining, and merging of processing blocks on critical path, result in a hardware accelerator for deblocking filter with high throughput at one hand and less area in terms of equivalent gates count on the other, when compared with existing state-of-the-art hardware accelerators in the literature. While working at clock frequency of 166 MHz, synthesized under 0.18 µm CMOS standard cell technology, it easily meets the throughput requirements of all the levels in H.264/AVC video coding standard and consumes only 12.06 K gates (excluding SRAM).","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in H.264/AVC\",\"authors\":\"M. Nadeem, Stephan Wong, G. Kuzmanov, A. Shabbir\",\"doi\":\"10.1109/ESTMED.2009.5336814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a high-throughput, area-efficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units, the transpose units are realized by efficient reuse of hardware resources to further reduce the area requirement. The two filtering units process the horizontal and vertical edges of the macro-block simultaneously and therefore further enhance the throughput of the hardware accelerator. Several other optimization techniques, such as reuse of intermediate results, pipelining, and merging of processing blocks on critical path, result in a hardware accelerator for deblocking filter with high throughput at one hand and less area in terms of equivalent gates count on the other, when compared with existing state-of-the-art hardware accelerators in the literature. While working at clock frequency of 166 MHz, synthesized under 0.18 µm CMOS standard cell technology, it easily meets the throughput requirements of all the levels in H.264/AVC video coding standard and consumes only 12.06 K gates (excluding SRAM).\",\"PeriodicalId\":104499,\"journal\":{\"name\":\"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESTMED.2009.5336814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTMED.2009.5336814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in H.264/AVC
In this paper, we present a high-throughput, area-efficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units, the transpose units are realized by efficient reuse of hardware resources to further reduce the area requirement. The two filtering units process the horizontal and vertical edges of the macro-block simultaneously and therefore further enhance the throughput of the hardware accelerator. Several other optimization techniques, such as reuse of intermediate results, pipelining, and merging of processing blocks on critical path, result in a hardware accelerator for deblocking filter with high throughput at one hand and less area in terms of equivalent gates count on the other, when compared with existing state-of-the-art hardware accelerators in the literature. While working at clock frequency of 166 MHz, synthesized under 0.18 µm CMOS standard cell technology, it easily meets the throughput requirements of all the levels in H.264/AVC video coding standard and consumes only 12.06 K gates (excluding SRAM).