基于fpga的图像编码应用的高效自适应堆排序(摘要)

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays Pub Date : 2014-02-26 DOI:10.1145/2554688.2554746

Yuhui Bai, S. Z. Ahmed, B. Granado

{"title":"基于fpga的图像编码应用的高效自适应堆排序(摘要)","authors":"Yuhui Bai, S. Z. Ahmed, B. Granado","doi":"10.1145/2554688.2554746","DOIUrl":null,"url":null,"abstract":"This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"509 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)\",\"authors\":\"Yuhui Bai, S. Z. Ahmed, B. Granado\",\"doi\":\"10.1145/2554688.2554746\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.\",\"PeriodicalId\":390562,\"journal\":{\"name\":\"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays\",\"volume\":\"509 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2554688.2554746\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种用于FPGA图像编码实现的自适应堆排序架构，该架构具体解决了编码过程中位于每个子带的不同数据量的排序问题。所建议的排序体系结构易于扩展。排序器的性能仅取决于排序的数据量。双端口存储器的有效使用可产生高达50 Msamples/s的高吞吐量，其自适应触发/关闭可提供高达20.9%的平均动态功耗降低。我们设计了这种结构，并将其整合到我们的小波数据自适应扫描(ASWD)模块中，该模块将小波系数重组为局部平稳序列，用于基于小波的图像编码器。我们在Altera的Stratix IV FPGA上验证了硬件作为基于Nios II处理器的片上系统的IP加速器。架构上的创新也可以用于其他需要高吞吐量和可扩展排序的应用程序。我们的实验表明，与运行在666 MHz的嵌入式ARM CortexA9处理器相比，我们的架构在100 MHz时可以提供大约13倍的加速，同时消耗242 mW的平均核心动态功率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A power-efficient adaptive heapsort for fpga-based image coding application (abstract only)

This paper presents an adaptive heap sort architecture for an image coding implementation on FPGA, which specifically addresses the issue of sorting different amount of data located in each subband during the coding. The proposed sorting architecture is easily scalable. Performance of the sorter only depends on the amount of data sorted. The efficient usage of dual port memories yields high throughput up to 50 Msamples/s and their adaptive trigger/shutdown provide the average dynamic power reduction up to 20.9%. We designed this architecture and incorporated it in our Adaptive Scanning of Wavelet Data (ASWD) module which reorganizes the wavelet coefficients into locally stationary sequences for a wavelet-based image encoder. We validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. The architectural innovations can also be exploited in other applications that require high throughput and scalable sorting. Our experiments show that compared to an embedded ARM CortexA9 processor running at 666 MHz, our architecture at 100 MHz can provide around 13X speedup while consuming 242 mW average core dynamic power.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

自引率

0.00%

发文量