硬件/软件协同设计的优化:与可配置处理器和FPGA技术相关

S. Xu, H. Pollitt-Smith
{"title":"硬件/软件协同设计的优化:与可配置处理器和FPGA技术相关","authors":"S. Xu, H. Pollitt-Smith","doi":"10.1109/CCECE.2007.423","DOIUrl":null,"url":null,"abstract":"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.","PeriodicalId":183910,"journal":{"name":"2007 Canadian Conference on Electrical and Computer Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology\",\"authors\":\"S. Xu, H. Pollitt-Smith\",\"doi\":\"10.1109/CCECE.2007.423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.\",\"PeriodicalId\":183910,\"journal\":{\"name\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 Canadian Conference on Electrical and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCECE.2007.423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Canadian Conference on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE.2007.423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

本文提出了一种基于新兴可配置处理器和FPGA技术的软硬件协同设计优化方法。该方法通过基于Tensilica的Xtensa LX核心和Xilinx Virtex-II Pro设备的图像压缩离散余弦变换(DCT)的优化来说明。描述了二维DCT变换的各种优化过程,包括在基本处理器上添加不同的处理器指令集以加速软件执行。结果表明,在一个简单的二维DCT实现中,通过添加一个4路SIMD(单指令多数据)指令,以中等的硬件成本,速度提高了26.76倍。优化后的4路SIMD处理器在FPGA板上实现以验证设计,并且与指令集仿真结果相比,显示出进一步显着的板上计算加速。介绍了硬件与软件的优化策略、速度和硬件成本权衡等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimization of HW/SW Co-Design: Relevance to Configurable Processor and FPGA Technology
This paper presents a methodology for optimization of HW/SW co-design based on emerging configurable processor and FPGA technologies. This methodology is illustrated by the optimization of a discrete cosine transform (DCT) for image compression based on Tensilica's Xtensa LX core and Xilinx Virtex-II Pro device. The various optimization processes of a 2-D DCT transform, including adding different processor instruction sets onto the base processor to speedup software execution, are described. The results show a 26.76 times speed increase by adding a 4-way SIMD (single instruction multiple data) instruction with moderate hardware cost for a simple 2-D DCT implementation. The optimized 4-way SIMD processor is implemented on the FPGA board to verify the design, and shows a further significant speedup for on-board calculation compared to instruction-set simulation results. The HW vs. SW optimization strategy, speed and HW cost trade-offs, etc. are presented.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信